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Abstract. In a functional language, the dominant control-flow mechanism is function 
call and return. Most higher-order flow analyses, including fc-CFA, do not handle call and 
return well: they remember only a bounded number of pending calls because they ap- 
proximate programs with control-flow graphs. Call/return mismatch introduces precision- 
degrading spurious control-flow paths and increases the analysis time. 

We describe CFA2, the first flow analysis with precise call/return matching in the 
presence of higher-order functions and tail calls. We formulate CFA2 as an abstract in- 
terpretation of programs in continuation-passing style and describe a sound and complete 
summarization algorithm for our abstract semantics. A preliminary evaluation shows that 
CFA2 gives more accurate data-flow information than OCFA and 1CFA. 



Introduction 

Higher-order functional programs can be analyzed using analyses such as the fc-CFA family 
|26| . These algorithms approximate the valid control- flow paths through the program as the 
set of all paths through a finite graph of abstract machine states, where each state represents 
a program point plus some amount of abstracted environment and control context. 

In fact, this is not a particularly tight approximation. The set of paths through a 
finite graph is a regular language. However, the execution traces produced by recursive 
function calls are strings in a context-free language. Approximating this control flow with 
regular-language techniques permits execution paths that do not properly match calls with 
returns. This is particularly harmful when analyzing higher-order languages, since flowing 
functional values down these spurious paths can give rise to further "phantom" control-flow 
structure, along which functional values can then flow, and so forth, in a destructive spiral 
that not only degrades precision but drives up the cost of the analysis. 

Pushdown models of programs can match an unbounded number of calls and returns, 
tightening up the set of possible executions to strings in a context-free language. Such 
models have long been used for first-order languages. The functional approach of Sharir 
and Pnueli |25| computes transfer-functions for whole procedures by composing transfer- 
functions of their basic blocks. Then, at a call-node these functions are used to compute the 
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data-flow value of the corresponding return- node directly. This "summary-based" technique 

Other pushdown models include Recursive State Machines 



has seen widespread use [5j,[23 
(2] and Pushdown Systems [3[ 10 

In this paper, we propose CFA2, a pushdown model of higher-order programs^ Our 
contributions can be summarized as follows: 

• CFA2 is a flow analysis with precise call/return matching that can be used in the compi- 
lation of both typed and untyped languages. No existing analysis for functional languages 
enjoys all of these properties. fc-CFA and its variants support limited call/return match- 
ing, bounded by the size of k (section 3.1). Type-based flow analysis with polymorphic 
subtyping (2lJ[22] also supports limited call/return matching, and applies to typed lan- 
guages only (section [7]) . 

• CFA2 uses a stack and a heap for variable binding. Variable references are looked up in 
one or the other, depending on where they appear in the source code. Most references in 
typical programs are read from the stack, which results in significant precision gains. Also, 
CFA2 can filter certain bindings off the stack to sharpen precision (section [4]) . k-CFA 
with abstract garbage collection [201 cannot infer that it is safe to remove these bindings. 



Last, the stack makes CFA2 resilient to syntax changes like 77-expansion (section 4.1). It 



is well known that k-CFA is sensitive to such changes 30,31 



We formulate CFA2 as an abstract interpretation of programs in continuation-passing 
style (CPS). The abstract semantics uses a stack of unbounded height. Hence, the abstract 
state space is infinite, unlike k-CFA. To analyze the state space, we extend the functional 
approach of Sharir and Pnueli [25]. The resulting algorithm is a search-based variant 
of summarization that can handle higher-order functions and tail recursion. Currently, 
CFA2 does not handle first-class-control operators such as call/cc (section [5]). 
We have implemented OCFA, 1CFA and CFA2 in the Twobit Scheme compiler [61. Our 
experimental results show that CFA2 is more precise than OCFA and 1CFA. Also, CFA2 
usually visits a smaller state space (section [6]). 



1. Preliminary definitions and notational conventions 

In flow analysis of A-calculus-based languages, a program is usually turned to an intermedi- 
ate form where all subexpressions are named before it is analyzed. This form can be CPS, 
administrative normal form [11] , or ordinary direct-style A-calculus where each expression 
has a unique label. Selecting among these is mostly a matter of taste, and an analysis using 
one form can be changed to use another form without much effort. 

This work uses CPS. We opted for CPS because it makes contexts explicit, as continu- 
ation-lambda terms. Moreover, call/cc, which we wish to support in the future, is directly 
expressible in CPS without the need for a special primitive operator. 

In this section we describe our CPS language. For brevity, we develop the theory of 
CFA2 in the untyped A-calculus. Primitive data, explicit recursion and side-effects can be 



added using standard techniques 26, ch. 3] uM ch. 9]. Compilers that use CPS 16,29 



J CFA2 stands for "a Context-Free Approach to Control-Flow Analysis". We use "context-free" with its 
usual meaning from language theory, to indicate that CFA2 approximates valid executions as strings in a 
context-free language. Unfortunately, "context-free" means something else in program analysis. To avoid 
confusion, we use "monovariant" and "polyvariant" when we refer to the abstraction of calling context in 
program analysis. CFA2 is polyvariant (aha context-sensitive), because it analyzes different calls to the same 
function in different environments. 
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v £ Var 
u £ UVar 
k £ CVar 
tp £ Lab 
I £ ULab 
7 £ CLab 
lam £ Lam 
ulam £ ULam 



UVar + CVar 
a set of identifiers 
a set of identifiers 
ULab + CLab 
a set of labels 
a set of labels 
ULam + CLam 
[(Ai(ufc) call)} 



clam £ CLam 
call £ Call 
UCall 
CCall 
g £ Exp 
f,e £ UExp 
q £ CExp 
pr £ Program 



[(A 7 (u) call)} 
UCall + CCall 

UExp + CExp 
ULam + UVar 
CLam + CVar 
ULam 



Figure 1: Partitioned CPS 



usually partition the terms in a program in two disjoint sets, the user and the continuation 
set, and treat user terms differently from continuation terms. 

We adopt this partitioning for our language (Fig. [T]). Variables, lambdas and calls get 
labels from ULab or CLab. Labels are pairwise distinct. User lambdas take a user argument 
and the current continuation; continuation lambdas take only a user argument. We apply 
an additional syntactic constraint: the only continuation variable that can appear free in 
the body of a user lambda (Xiiuk) call) is k. This simple constraint forbids first-class 
control |24| . Intuitively, we get such a program by CPS-converting a direct-style program 
without call/cc. 

We assume that all variables in a program have distinct names. Concrete syntax en- 
closed in [•] denotes an item of abstract syntax. Functions with a '?' subscript are predi- 
cates, e.g., Var?(e) returns true if e is a variable and false otherwise. 

We use two notations for tuples, (ei, . . . , e n ) and (ei, . . . , e n ), to avoid confusion when 
tuples are deeply nested. We use the latter for lists as well; ambiguities will be resolved by 
the context. Lists are also described by a head-tail notation, e.g., 3 :: (1,3, —47). 

CFA2 treats references to the same variable differently in different contexts. We split 
references in two categories: stack and heap references. In direct-style, if a reference appears 
at the same nesting level as its binder, then it is a stack reference, otherwise it is a heap 
reference. For example, the program (Ai (x) (A2 (y) (x (x y)))) has a stack reference to 
y and two heap references to x. Intuitively, only heap references may escape. When a 
program p is CPS-converted to a program p' , stack (resp. heap) references in p remain stack 
(resp. heap) references in p' . All references added by the transform are stack references. 

We can give an equivalent definition of stack and heap references directly in CPS, 
without referring to the original direct-style program. Labels can be split into disjoint 
sets according to the innermost user lambda that contains them. In the program (Ai(x 
kl) (kl (A 2 (y k2) (x y (A 3 (u) (x u k2) 4 )) 5 )) 6 ), which is the CPS translation of the 
previous program, these sets are {1,6} and {2,3,4,5}. The "label to variable" map LV(tp) 
returns all the variables bound by any lambdas that belong in the same set as if), e.g., 
LV(4) = {y, k2,u} and LV(6) = {x,kl}. We use this map to model stack behavior, 
because all continuation lambdas that "belong" to a given user lambda A; get closed by 
extending A^'s stack frame (c/. section[4]). Notice that, for any ip, LV(ip) contains exactly 
one continuation variable. Using LV, we give the following definition. 



1 



D. VARDOULAKIS AND O. SHIVERS 



[UEA] (l<.feq)%f3,ve,i) -> (proc,d,c,ve,l :: t) 
proc = A(f,P, ve) 
d = A(e, P, ve) 
c = A(q,P,ve) 



A(g,0,ve) 



(g,P) Lam?(g) 
ve(g, /3(g)) Var?(g) 



Concrete domains: 



[UAE] 


(proc, d, c, ve, t) — > (call, /3' , ve' , t) 


<, £ S'tate = 


Eval + Apply 




proc = {[<iXduk) call)},P) 


Eval = 


UEval + CEval 




P' = P[u H- t][k t] 


UEval = 


UCall x BEnv x x Time 




ve' = ve[(u,t) d][(k,t) >-¥ c] 


CEval = 


CCall x B£Vti; x VEnv x Time 






Apply = 


V Apply + CApply 


[CEA] 


(I(<?e) 7 ],^, ve,t) -> (proc,d, ve,-y :: t) 


U Apply = 


UClos x t/C/os x CClos x V£Vw x Time 




proc = _4(g:, /?, ve) 


C Apply = 


CClos x C/C/os x VEnv x Time 




d — A(e, p, ve) 


Clos = 


C/CZos + CClos 






d G UClos = 


ULam x _B.Eni> 


[CAE] 


(proc, d, ve, t) — > (call, P' , ve' , t) 


c e ccios = 


(Cham x BEnv) + halt 




proc = (ICA^Cu) calDIP) 


P e BEnv = 


Var — 1 Time 




P' = P[u l-> t] 


ve £ VEnv = 


Var x Time — 1 CZos 




ve' = t?e[(u, t) i->- d] 


t e Time = 


ia6* 



Figure 2: Concrete semantics and domains for Partitioned CPS 



Definition 1.1 (Stack and heap references). 

• Let tp be a call site that refers to a variable v. The predicate S?(ip, v) holds iff v £ LV(iIj). 
We call v a stack reference. 

• Let t/j be a call site that refers to a variable u. The predicate H-?(ip, v) holds iff v ^ LV(ip). 
We call v a heap reference. 

• v is a stack variable, written S?(v), iff all its references satisfy 5?. 

• t> is a heap variable, written H?(v), iff some of its references satisfy H?. 

Then, S?(5,y) holds because y 6 {y, k2,u} and -ff?(5,x) holds because x ^ {y, k2,u}. 



2. Concrete Semantics 

Execution in Partitioned CPS is guided by the semantics of Fig. [2] In the terminology of 
abstract interpretation, this semantics is called the concrete semantics. In order to find 
properties of a program at compile time, one needs to derive a computable approxima- 
tion of the concrete semantics, called the abstract semantics. CFA2 and &-CFA are such 
approximations . 

Execution traces alternate between Eval and Apply states. At an Eval state, we evaluate 
the subexpressions of a call site before performing a call. At an Apply, we perform the call. 

The last component of each state is a time, which is a sequence of call sites. Eval to 
Apply transitions increment the time by recording the label of the corresponding call site. 
Apply to Eval transitions leave the time unchanged. Thus, the time t of a state reveals the 
call sites along the execution path to that state. 

Times indicate points in the execution when variables are bound. The binding envi- 
ronment /3 is a partial function that maps variables to their binding times. The variable 
environment ve maps variable-time pairs to values. To find the value of a variable v, we 
look up the time v was put in /3, and use that to search for the actual value in ve. 

Let's look at the transitions more closely. At a UEval state with call site (feq) 1 , we 
evaluate /, e and q using the function A. Lambdas are paired up with /3 to become closures, 
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while variables are looked up in ve using /3. We add the label I in front of the current time 
and transition to a U Apply state (rule [UEA]). 

From \J Apply to Eval, we bind the formals of a procedure {{(Xiiuk) call)}, (3) to the 
arguments and jump to its body. The new binding environment f3' extends the procedure's 
environment, with u and k mapped to the current time. The new variable environment ve' 
maps (u, t) to the user argument d, and (k,t) to the continuation c (rule [UAE]). 

The remaining two transitions are similar. We use halt to denote the top-level continu- 
ation of a program pr. The initial state T(pr) is ((pr, 0), input, halt, 0, ()), where input is a 
closure of the form ([(A; (life) call)},®). The initial time is the empty sequence of call sites. 

CPS-based compilers may or may not use a stack for the final code. Steele's view, illus- 
trated in the Rabbit compiler [29], is that argument evaluation pushes stack and function 
calls are GOTOs. Since arguments in CPS are not calls, argument evaluation is trivial and 
Rabbit never needs to push stack. By this approach, every call in CPS is a tail call. 

An alternative style was used in the Orbit compiler |16] . At every function call, Orbit 
pushes a frame for the arguments. By this approach, tail calls are only the calls where the 
continuation argument is a variable. These CPS call sites were in tail position in the initial 
direct-style program. CEval states where the operator is a variable are calls to the current 
continuation with a return value. Orbit pops the stack at tail calls and before calling the 
current continuation. 

We will see later that the abstract semantics of CFA2 uses a stack, like Orbit. How- 
ever, CFA2 computes safe flow information which can be used by both aforementioned 
approaches. The workings of the abstract interpretation are independent of what style an 
implementor chooses for the final code. 



3. Limitations of A;-CFA 

In this section, we discuss the main causes of imprecision and inefficiency in A;-CFA. Our 
motivation in developing CFA2 is to create an analysis that overcomes these limitations. 
We assume some familiarity with k-CFA, and abstract interpretation in general. De- 



tailed descriptions on these topics can be found in 19 , 26 . We use Scheme syntax for our 
example programs. 

3.1. fc-CFA does not properly match calls and returns. In order to make the state 
space of A;-CFA finite, Shivers chose a mechanism similar to the call-strings of Sharir and 



Pnueli 25 . Thus, recursive programs introduce approximation by folding an unbounded 
number of recursive calls down to a fixed-size call-string. In effect, by applying k-CFA to a 
higher-order program, we turn it into a finite-state machine. Taken to the extreme, when 
k is zero, a function can return to any of its callers, not just to the last one. 

For example, consider the function len that computes the length of a list. Fig. [3] shows 
the code for len, its CPS translation and the associated control-flow graph. In the graph, 
the top level of the program is presented as a function called main. Function entry and exit 
nodes are rectangles with sharp corners. Inner nodes are rectangles with rounded corners. 
Each call site is represented by a call node and a corresponding return node, which contains 
the variable to which the result of the call is assigned. Each function uses a local variable 
ret for its return value. Solid arrows are intraprocedural steps. Dashed arrows go from 
call sites to function entries and from function exits to return points. There is no edge 
between call and return nodes; a call reaches its corresponding return only if the callee 
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(define (len 1) 
(if (pair? 1) 

(+ 1 (len (cdr 1))) 
0)) 
(len '(3)) 

JJ- CPS 
(define (len 1 k) 
(pair? 1) 
(A(test) 
(if test 
(AO 
(cdr 1 
(A(rest) 
(len rest 
(A(ans) 

(+ 1 ans k)))))) 
(AO (k 0))))) 
(len '(3) halt) 



len(l) <-~_ 



main() 







len ' (3) 



ret 



6 test : = pair? 1 



7 (test) 




8 (ret := O] 



Figure 3: OCFA on len 



terminates. A monovariant analysis, such as OCFA, considers every path from 1 to 4 to be 
a valid execution. In particular, it cannot exclude the path 1, 2, 5, 6, 7, 9, 10, 5, 6, 7, 8, 13, 
3, 4. By following such a path, the program will terminate with a non-empty stack. It is 
clear that /c-CFA cannot help much with optimizations that require accurate calculation of 
the stack change between program states, such as stack allocation of closure environments. 

Spurious flows caused by call/return mismatch affect traditional data-flow information 
as well. For instance, OCFA-constant-propagation for the program below cannot spot that 
n2 is the constant 2, because 1 also flows to x and is mistakenly returned by the second call 
to app. 1CFA also fails, because both calls to id happen in the body of app. 2CFA helps 
in this example, but repeated //-expansion of id can trick k-CFA for any k. 

(let* ((app (A(f e) (f e))) 

(id (A(x) x)) 

(nl (app id 1)) 

(n2 (app id 2))) 
(+ nl n2)) 

In a non-recursive program, a large enough k can provide accurate call/return matching, 
but this is not desirable because the analysis becomes intractably slow even when k is 1 [30] . 
Moreover, the ubiquity of recursion in functional programs calls for a static analysis that 
can match an unbounded number of calls and returns. This can be done if we approximate 
programs using pushdown models instead of finite-state machines. 

3.2. The environment problem and fake rebinding. In higher-order languages, many 
bindings of the same variable can be simultaneously live. Determining at compile time 
whether two references to some variable will be bound in the same run-time environment 
is referred to as the environment problem |26|. Consider the following program: 
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(let ((f (A(x thunk) (if (number? x) (thunk) (Ai() x))))) 
(f (f "foo" "bar"))) 
In the inner call to f , x is bound to "foo" and Ai is returned. We call f again; this time, x 
is 0, so we jump through (thunk) to Ai, and reference x, which, despite the just-completed 
test, is not a number: it is the string "foo". Thus, during abstract interpretation, it is gen- 
erally unsafe to assume that a reference has some property just because an earlier reference 
had that property. This has an unfortunate consequence: sometimes an earlier reference 
provides safe information about the reference at hand and fc-CFA does not spot it: 

(define (compose-same f x) (f (f x) 1 ) 2 ) 
In compose-same, both references to f are always bound at the same time. However, if 
multiple closures flow to f , fc-CFA may call one closure at call site 1 and a different closure 
at call site 2. This flow never happens at run time. 

Imprecise binding information also makes it difficult to infer the types of variable ref- 
erences. In len, the cdr primitive must perform a run-time check and signal an error if 
1 is not bound to a pair. This check is redundant since we checked for pair? earlier, and 
both references to 1 are bound in the same environment. If len is called with both pair 
and non-pair arguments, k-CFA cannot eliminate the run-time check. 

CFA2 tackles this problem by distinguishing stack from heap references. If a reference 
v appears in a static context where we know the current stack frame is its environment 
record, we can be precise. If v appears free in some possibly escaping lambda, we cannot 
predict its extent so we fall back to a conservative approximation. 



3.3. Imprecision increases the running time of the analysis. &-CFA for k > is not 

a cheap analysis, both in theory [30] and in practice [27]. Counterintuitively, imprecision 
in higher-order flow analyses can increase their running time: imprecision induces spurious 
control paths, along which the analysis must flow data, thus creating further spurious paths, 
and so on, in a vicious cycle which creates extra work whose only function is to degrade pre- 



cision. This is why techniques that aggressively prune the search space, such as TCFA 20 
not only increase precision, but can also improve the speed of the analysis. 

In the previous subsections, we saw examples of information known at compile time 
that fc-CFA cannot exploit. CFA2 uses this information. The enhanced precision of CFA2 
has a positive effect on its running time ( cf. section [6]) . 



4. The CFA2 semantics 

In this section we define the abstract semantics of CFA2. The abstract semantics approx- 
imates the concrete semantics. This means that each concrete state has a corresponding 
abstract state. Therefore, each concrete execution, i.e., sequence of states related by — 
has a corresponding abstract execution that computes an approximate answer. 

Each abstract state has a stack. Analyzing recursive programs requires states with 
stacks of unbounded size. Thus, the abstract state space is infinite and the standard algo- 
rithms for fc-CFA [19|26| will diverge because they work by enumerating all states. We show 
how to solve the stack-size problem in section [5} Here, we describe the abstract semantics 



(section 4.1), show how to map concrete to abstract states and prove the correctness of the 



abstract semantics (section 4.2). 
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{e} 


Lam-?(e) 


st(e) 


S?(ip,e) 


ft(e) 


F 7 (V,e) 


q 


Lam-?(q) 


st{q) 


Var-r (q) 



[UEA] (\(feq)%st,h)^ (ulam,d,c,st',h) 

ulame Au(f,l,st,h) Au(e,i>,st,h) = 
d = A u (e, I, st, h) 
c = A k (q, st) 

(pop(st) Var ? {q) - * 
st Lam?(q) A(H 7 (l,f)V Larrn(f)) 
st[f h-> {ulam}] Lam-r{q) A S?(Z, /) 

Abstract domains: 

[UAE] ([(A; (w fc) ca/O], d, c, st, ft) ~> (coi/, st', ft') 5 g KBro/ = UCall x StacA; x Heap 

st' = push ([u h-> d] [k h-> c] , st) £ 6 [Mpp/y = (7£am x t/Oos x CCfos x StacA; x tfeap 

ft/ = (h U [u (-»• d] H ? (u) «; G = CCall x SWA; x Heap 

\h S?(u) f G <54pp]y = CCZos x raos X Stack X tfeap 

d 6 f/CTos = Pow(ULam) 

[CEA] ([(ge) 7 ],_sf,ft) (clam,d, st' , h) £e CClos = CLam + halt 

clam = A h (q, st) fr,tf € Frame = (UVar^ UChs) U ( CVar CCfos) 

d = Au(e,j,st,h) st £ Stack = Frame* 



st' = 



pop(st) Var ? (q) h e Heap= War f/CZos 

st Lam-r(q) 



h! = 



Stack operations: 
(«/ :: st) = st 
push(fr, st) = /r :: st 



[CAE] ([(A 7 Cm) call)], d, st, h) (ca/Z, st', ft ) pop(tf :: st) = st 

st' = s£[u h-> d] 

/iU[uh>d] #? («) 

ft " S 7 (u) W-,st)(v) 4 f/(„) 

(t/ :: si)[ui-^d] = #[ui->-d]::st 

Figure 4: Abstract semantics and relevant definitions 

4.1. Abstract semantics. The CFA2 semantics is an abstract interpreter that executes a 
CPS program, using a stack for variable binding and return-point information. 

We describe the stack-management policy with an example. Assume that we run the len 
program of section[3j When calling (len ' (3) halt) we push a frame [1 1— > (3)][k 1— > halt] 
on the stack. The test (pair? 1) is true, so we add the binding [test 1— > true] to the top 
frame and jump to the true branch. We take the cdr of 1 and add the binding [rest (->•()] 
to the top frame. We call len again, push a new frame for its arguments and jump to 
its body. This time the test is false, so we extend the top frame with [test 1— > false] and 
jump to the false branch. The call to k is a function return, so we pop a frame and pass 
to (A(ans) (+ 1 ans k)). Call site (+ 1 ans k) is also a function return, so we pop the 
remaining frame and pass 1 to the top-level continuation halt. 

In general, we push a frame at function entries and pop at tail calls and at function 
returns. Results of intermediate computations are stored in the top frame. This policy 
enforces two invariants about the abstract interpreter. First, when executing inside a user 
function (A; (u k) call) , the domain of the top frame is a subset of LV{1). Second, the frame 
below the top frame is the environment of the current continuation. 

Each variable v in our example was looked up in the top frame, because each lookup 
happened while executing inside the lambda that binds v. This is not always the case; in 



the first snippet of section 3.2 there is a heap reference to x in Ai. When control reaches 



that reference, the top frame does not belong to the lambda that binds x. In CFA2, we 
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look up stack references in the top frame, and heap references in the heap. Stack lookups 
below the top frame never happen. 

The CFA2 semantics appears in Fig. |4j An abstract value is either an abstract user 
closure (member of the set UClos) or an abstract continuation closure (member of CClos). 
An abstract user closure is a set of user lambdas. An abstract continuation closure is either 
a continuation lambda or halt. A frame is a map from variables to abstract values, and a 
stack is a sequence of frames. All stack operations except push are defined for non-empty 
stacks only. A heap is a map from variables to abstract values. It contains only user 
bindings because, without first-class control, every continuation variable is a stack variable. 

On transition from a UEval state <f to a U Apply state <f (rule [UEA]), we first evaluate 
/, e and q. We evaluate user terms using A u and continuation terms using Ak- We non- 
deterministically choose one of the lambdas that flow to / as the operator in P] The change 
to the stack depends on q and /. If q is a variable, the call is a tail call so we pop the 
stack (case 1). If q is a lambda, it evaluates to a new closure whose environment is the top 
frame, hence we do not pop the stack (cases 2, 3). Moreover, if / is a lambda or a heap 
reference then we leave the stack unchanged. However, if / is a stack reference, we set /'s 
value in the top frame to {ulam}, possibly forgetting other lambdas that flow to /. This 



"stack filtering" prevents fake rebinding (cf. section 3.2): when we return to c, we may 
reach more stack references of /. These references and the current one are bound at the 
same time. Since we are committing to ulam in this transition, these references must also 

be bound to ulam . 

In the U Apply-to- Eval transition (rule [UAE]), we push a frame for the procedure's 
arguments. In addition, if u is a heap variable we must update its binding in the heap. The 
join operation U is defined as: 



(h\J[u>-> d])(v) 



h(v) v ^ it 
h(v) U d v = u 



In a CEval-to- C Apply transition (rule [CEA]), we are preparing for a call to a con- 
tinuation so we must reset the stack to the stack of its birth. When q is a variable, the 
CEval state is a function return and the continuation's environment is the second stack 
frame. Therefore, we pop a frame before calling clam. When q is a lambda, it is a newly 
created closure thus the stack does not change. Note that the transition is deterministic, 
unlike [UEA]. Since we always know which continuation we are about to call, call/return 
mismatch never happens. For instance, the function len may be called from many places 
in a program, so multiple continuations may flow to k. But, by retrieving k's value from 
the stack, we always return to the correct continuation. 

In the C Apply-to- Eval transition (rule [CAE]), our stack policy dictates that we extend 
the top frame with the binding for the continuation's parameter u. If u is a heap variable, 
we also update the heapj^] 



An abstract execution explores one path, but the algorithm that searches the state s pace considers all 
possible executions (cf. section [Hj), as is the case in the operational formulation of fc-CFA 
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3 A11 temporaries created by the CPS transform are stack variables; but a compiler optimization may 
rewrite a program to create heap references to temporaries. 
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Examples. When the analyzed program is not recursive, the stack size is bounded so we 
can enumerate all abstract states without diverging. Let's see how the abstract semantics 
works on a simple program that applies the identity function twice and returns the result 
of the second call. The initial state I(pr) is a U Apply. 

([(A(id h)(id 1 (Ai(nl)(id 2 (A 2 (n2)(h n2) )))))], {[(A 3 (x k) (k x) )]}, halt, (), 0) 

All variables in this example are stack variables, so the heap will remain empty throughout 
the execution. In frames, we abbreviate lambdas by their labels. By rule [UAE], we push 
a frame for id and h and transition to a UEval state. 

([(id 1 (Ai(nl)(id 2 (A 2 (n2)(h n2) ))))], ([id ^ {A 3 }] [h ^ halt]), 0) 

We look up id in the top frame. Since the continuation argument is a lambda, we do not 
pop the stack. The next state is a V Apply. 

([(A 3 (x k)(k x))],{l},Ai,([id^{A 3 }][h^/mft]),0) 

We push a frame for the arguments of A 3 and jump to its body. 

([(k x)], ([x ' y {l}][k i y Ai], [id i y {A 3 }][h ^ halt}), 0) 

This is a CEval state where the operator is a variable, so we pop a frame. 

([(AiCnlKid 2 (A 2 (n2)(h n2) ) ) )], {1}, ([id ^ {A 3 }][h ^ halt]), 0) 

We extend the top frame to bind nl and jump to the body of Ai. 

([(id 2 (A 2 (n2)(h n2)))],([nl^{l}][id^{A 3 }][h^^]),0) 

The new call to id is also not a tail call, so we do not pop. 

([(A 3 (x k)(k x))],{2},A 2 ,([nl^{l}][id^{A 3 }][h^/mft]),0) 

We push a frame and jump to the body of A 3 . 

([(k i)],([i4 {2}][k^ A 2 ],[nl i ^ {l}][id^ {A 3 }][h^ halt]),®) 

We pop a frame and jump to A 2 . 

([(A 2 (n2)(h n2))],{2},([nl ^ {1}] [id ^ {A 3 }][h ^ halt]), 0) 

We extend the top frame to bind n2 and jump to the body of A 2 . 

([(h n2)], ([n2 m- {2}][nl ^ {l}][id ^ {A 3 }][h M- halt]),®) 

The operator is a variable, so we pop the stack. The next state is a final state, so the 
program terminates with value {2}. 

(halt, {2}, (),<&) 

1CFA would also find the precise answer for this program. However, if we r/-expand A 3 
to (A 3 (x k) ((A 4 (y k2) (k2 y)) x k)), 1CFA will return {1,2} because both calls to A 4 
happen at the same call site. CFA2 is more resilient to 77-expansion. It will return the precise 
answer in the modified program because the change did not create any heap references. 
However, if we change A 3 to (A 3 (x k) ((A 4 (y k2) (k2 x)) x k)), then both 1 and 2 flow 
to the heap reference to x and CFA2 will return {1,2}. 
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\({ig 1 ...g n )%P,ve,t)\ ca = ({{g 1 ...g n )% toStack(LV (ip), P, ve), \ve\ 



I (([(A; (uk) call)}, 0) , d, c,ve,t)\ ca = ([(A; (uk) call)}, \d\ ca , \c\ ca , st, \ve\ ca ) 
() c — halt 

toStack{LV{i),j3',ve) c = (l(X^(u')call")J,p') 



where st — 



|(([(A 7 (w) call)},j3},d,ve,t)\ ca = ([(A 7 (u) call)], \d\ ca , to Stack (LV(-y), /3, ve), \ve\ ca ) 
\(halt,d, ve,t)\ ca = (halt, \d\ ca , {}, \ve\ ca ) 
\([(Xi(uk) call)\,P)\ ca = {{(\i(uk) call)}} 
|([(A 7 (w) call)],P)\ ca = [(A 7 (w) call)} 
\halt\ca = halt 

\ve\oa — { (u, Ut \ve(u,t)\ ca ) ■ Ht(u)} 

( {[ui !->• di][k halt]} halt = ve(k,P(k)) 

toStack({u\, . . . ,u n ,k], P,ve) = < 



{[m i ^ di][feh-> |(A 7 (u) ca/O]] :: st ([(A 7 C«) call)},?') = ve(k, /3(h)) 
where di = | ve(ut, P(ui))\ ca and st — to Stack (LV (7), /3' ue) 

Figure 5: From concrete states to abstract states 
(call, sti, hi) C (call, st2, h%) iff sti C st2 A /ii C /12 

(ulam, di, c, sti, /ii) C (ulam, &2, c, stz, ha) iff di C d2 A sti E A fti C 
(c, di,sti,Ai) C (c, d2,st 2 , hv) iff di C d 2 A st- L iZ st 2 A /ii C h 2 
hi C /12 iff C /i2(u) for each u £ dom(/ii) 

t/i - sti C t/ 2 :: st2 iff t/i E t/a A sti E st 2 

E () 

tfi E t/2 iff tfi( v ) E t/2( u ) f° r each « £ dom(t/ 1 ) 
di E d2 iff di E d2 

c E c 

Figure 6: The C relation on abstract states 

4.2. Correctness of the abstract semantics. We proceed to show that the CFA2 seman- 
tics safely approximates the concrete semantics. First, we define a map |-| ca from concrete 
to abstract states. Next, we show that if c; transitions to in the concrete semantics, the 
abstract counterpart |?| ca of ? transitions to a state <f which approximates \s'\ca- Hence, we 
ensure that the possible behaviors of the abstract interpreter include the actual run-time 
behavior of the program. 

The map |-| ca appears in Fig. [5] The abstraction of an Eval state ? of the form 

([(c/i . . . (fa) ]j P> ve i t) is an Eval state ? with the same call site. Since <f does not have 
a stack, we must expose stack-related information hidden in (3 and ve. Assume that is 
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the innermost user lambda that contains if). To reach ijj, control passed from a UApply 
state <f' over A/. According to our stack policy, the top frame contains bindings for the 
formals of A/ and any temporaries added along the path from q' to q. Therefore, the do- 
main of the top frame is a subset of LV(l), i.e., a subset of LV(tp). For each user variable 
Ui G (LV(Tp) n dom(/3)), the top frame contains [ui \-t \ve(ui, (3(ui))\ ca \. Let k be the sole 
continuation variable in LV(ip). If ve{k, f3{k)) is halt (the return continuation is the top- 
level continuation), the rest of the stack is empty. If ve(k,(3(k)) is ([(A 7 (u) call)}, (3'), the 
second frame is for the user lambda in which A 7 was born, and so forth: proceeding through 
the stack, we add a frame for each live activation of a user lambda until we reach halt. 

The abstraction of a UApply state over ({(Xiiuk) call)}, (3) is a UApply state <f whose 
operator is [(A/ (u k) call)}. The stack of <f is the stack in which the continuation argument 
was created, and we compute it using toStack as above. 

Abstracting a CApply is similar to the UApply case, only now the top frame is the 
environment of the continuation operator. Note that the abstraction maps drop the time 
of the concrete states, since the abstract states do not use times. 

The abstraction of a user closure is the singleton set with the corresponding lambda. 
The abstraction of a continuation closure is the corresponding lambda. When abstracting 
a variable environment ve, we only keep heap variables. 

We can now state our simulation theorem. The proof proceeds by case analysis on the 
concrete transition relation. The relation <fi C £ 2 is a partial order on abstract states and 
can be read as "<fi is more precise than (Fig. [6]). The proof can be found in the appendix. 

Theorem 4.1 (Simulation). and |?| ca E <?, then there exists q' such that <f ~> <f' 

and \q'\ca E <?■ 

5. Computing CFA2 

5.1. Pushdown models and summarization. In section [3j we saw that a monovariant 
analysis like OCFA treats the control- flow graph of len as a finite-state machine (FSM), 
where all paths are valid executions. For k > 0, A;-CFA still approximates len as a FSM, 
albeit a larger one that has several copies of each procedure, caused by different call strings. 

But in reality, calls and returns match; the call from 2 returns to 3 and each call from 
10 returns to 11. Thus, by thinking of executions as strings in a context-free language, we 
can do more precise flow analysis. We can achieve this by approximating len as a pushdown 
system (PDS) [3j[l0]- A PDS is similar to a pushdown automaton, except it does not read 
input from a tape. For illustration purposes, we take the (slightly simplified) view that the 
state of a PDS is a pair of a program point and a stack. The transition rules for call nodes 
push the return point on the stack: 

(2, s) ^ (5, 3 :: s), (10, s) ^ (5, 11 :: s) 

Function exits pop the node at the top of the stack and jump to it: 

(13, n .: s) <— >■ (n, s) 

All other nodes transition to their successor (s) and leave the stack unchanged, e.g. 
(3, s) ^ (4, s), (7, s) ^ (8, s), (7, s) ^ (9, a) 
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Suppose we want to find all nodes reachable from 1. Obviously, we cannot do it by 
enumerating all states. Thus, algorithms for pushdown reachability use a dynamic pro- 
gramming technique called summarization. The intuition behind summarization is to flow 
facts from a program point n with an empty stack to a point n' in the same procedure. We 
say that n' is same-context reachable from n. These facts are then suitably combined to get 
flow facts for the whole program. 

We use summarization to explore the state space in CFA2. Our algorithm is based 
on Sharir and Pnueli's functional approach [25[ pg. 207], adapted to the more modern 
terminology of Reps et al. |23| . Summarization requires that we know all call sites of 
a function. Therefore, it does not apply directly to higher-order languages, because we 
cannot find the call sites of a function by looking at a program's source code. We need a 
search-based variant of summarization, which records callers as it discovers them. 

We illustrate our variant on len. We find reachable nodes by recording path edges, 
i.e., edges whose source is the entry of a procedure and target is some program point in 
the same procedure. Path edges should not be confused with the edges already present 
in len's control-flow graph. They are artificial edges used by the analysis to represent 
intraprocedural paths, hence the name. From 1 we can go to 2, so we record (1, 1) and 
(1, 2). Then 2 calls len, so we record the call (2, 5) and jump to 5. In len, we reach 6 and 
7 and record (5,5), (5,6) and (5,7). We do not assume anything about the result of the 
test, so we must follow both branches. By following the false branch, we discover (5,8) and 
(5, 13). Node 13 is an exit, so each caller of len can reach its corresponding return point. 
We keep track of this fact by recording the summary edge (5, 13). We have only seen a call 
from 2, so we return to 3 and record (1, 3). Finally, we record (1,4), which is the end of the 
program. By analyzing the true branch, we discover edges (5, 9) and (5, 10), and record the 
new call (10, 5). Reachability inside len does not depend on its calling context, so from the 
summary edge (5, 13) we infer that 10 can reach 11 and we record (5, 11) and subsequently 
(5, 12). At this point, we have discovered all possible path edges. 

Summarization works because we can temporarily forget the caller while analyzing 
inside a procedure, and remember it when we are about to return. A consequence is that if 
from node n with an empty stack we can reach n' with stack s, then n with s' can go to n' 
with append(s, s'). 

5.2. Local semantics. Summarization-based algorithms operate on a finite set of program 
points. Hence, we cannot use (an infinite number of) abstract states as program points. 
For this reason, we introduce local states and define a map | • | a i from abstract to local states 
(Fig. [7]). Intuitively, a local state is like an abstract state but with a single frame instead 
of a stack. Discarding the rest of the stack makes the local state space finite; keeping the 
top frame allows precise lookups for stack references. 

The local semantics describes executions that do not touch the rest of the stack (in 

other words, executions where functions do not return). Thus, a CEval state with call site 
(fee) 7 has no successor in this semantics. Since functions do not call their continuations, 
the frames of local states contain only user bindings. Local steps are otherwise similar to 
abstract steps. The metavariable <f ranges over local states. We define the map |-| c ; from 
concrete to local states to be |-| a / o |-| ca . 

We can now see how the local semantics fits in a summarization algorithm for CFA2. 
Essentially, CFA2 approximates a higher-order program as a PDS. The local semantics 
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A u (e,i/>,tf,h) = < 



{e} Lam?(e) 
tf(e) St(ip,e) 
[h(e) H ? ^,e) 



[UEA] (\{f eq) l \,tf,h)^> (ulam,d,h) Local domains: 

ulame A u (f, l,tf,h) Evai = CaU x §^ k x Heap 

d = A u (e, I, tf, h) JJA^ly = ULam x UCfos x Heap 



C Apply = CClos x UClos x Stack* Heap 



[UAE] ([(Ai(ufc) catt)\,d,h) RJ> (caZZ, [u i-> d], /i') 

'Au[uh>d] ff? (u) F ™ = f ^ UClos 

/j Stacfc = Frame 



b! = 



[CEA] ([(dam e) 7 ], tf, h) ss> (c^am,d, tf, h) Abstract to local maps: 



d = .4„(e,7, i/, h) 



\(call, st, h)\ a i = (call, \st\ a i, h) 
\(ulam, d, c, st,h)\ a i = (ulam,d,h) 



[CAE] ([(A 7 (u) call)],d,tf,h) *$> (call,tf ,h') 

tf = tf[u i — y d] \(c,d,st,h)\ a i = (c,d, \st\ al ,h) 

AU[iii4d] ff?(u) I*/ :: S< 'U; = { (u, */(«)) : C/Var ? (u)} 



A' = 



^ |0I«I 

Figure 7: Local semantics 



describes the PDS transitions that do not return (intraprocedural steps and function calls). 
We discover return points by recording callers and summary edges. 

Summarization distinguishes between different kinds of states: entries, exits, calls, re- 
turns and inner states. CPS lends itself naturally to such a categorization: 

• A UApply state corresponds to an entry node — control is about to enter the body of a 
function. 

• A CEval state where the operator is a variable is an exit node — a function is about to 
pass its result to its context. 

• A CEval state where the operator is a lambda is an inner state. 

• A UEval state where the continuation argument is a variable is an exit — at tail calls 
control^does not return to the caller. 

• A UEval state where the continuation argument is a lambda is a call. 

• A C Apply state is a return if its predecessor is an exit, or an inner state if its predecessor 
is also an inner state. Our algorithm will not need to distinguish between the two kinds 

of CApplys; the difference is just conceptual. 

Last, we generalize the notion of summary edges to handle tail recursion. Consider an 
earlier example, written in CPS. 
((ACapp id k) 

(app id 1 (Ai(nl) (app id 2 (A 2 (n2) (+ nl n2 k)))))) 
(A(f e k) (f e k)) 
(A(x k) (k x)) 
halt) 
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01 Summary , Callers , TCallers , Final < — 

02 Seen, W <— {(X(pr), ±{pr))} 

03 while W ^ 

04 remove (?i , fe) from W 

05 switch ft 

06 case ?2 of Entry, CApply, Inner-CEval 

07 for each ?3 in succ^) Propagate (<fi , ^3) 

08 case ?2 of Call 

09 for each C3 in succ (fe) 

10 Propagate (^3 , ?3> 

11 insert (?i , §2, ^3) in Callers 

12 for each (?3 , Q,) in Summary Update (ft, §j , ?3 , ft) 

13 case ft of Exit-CEval 

14 if ft = X{pr) then 

15 Final (ft) 

16 else 

17 insert (ft, §2) in Summary 

18 for each (ft, ft, ft) in Callers Update (ft, ft, ft, ft) 

19 for each (ft, ft, ft) in TCallers Propagate(ft, ft) 

20 case ft of Exit-TC 

21 for each ft in succ (C2 ) 

22 Propagate (ft, ft) 

23 insert (ft , §2, ft) in TCallers 

24 for each (ft, ft) in Summary Propagate (ft , ft) 

Propagate (ft , ft) = 

25 if (ft, ft) not in Seen then insert (ft , ft) in 5een and 

Update (ft, ft, ft, ft) = 



26 


ft 


of 


the 


form ([(AijCui fci) coiZi)] , 


di, h) 


27 


ft 


of 


the 


form ([(/ e2 (A 72 (112) calh 


»H tf 2 , to) 


28 


ft 


of 


the 


form (|(A; 3 (it3 £3) ca/^)] , 


d3, fe) 


29 


ft 


of 


the 


form (|(fc 4 e 4 ) 74 ], tf 4 , h 4 ) 




30 


d 




A 


. u (e 4 , 74, tf 4 , /14) 




31 






■1 


ftf 2 [/ ^{I^C^ h) call 3 )}}] 


5 ? (/ 2 ,/) 






[tf 2 


H ? (l 2 ,f)V Lam ? (f) 


32 






([(A 72 (u 2 ) ca^ 2 )], d, tf, /14) 




33 


Propagate (ft , ?) 





Final (?) = 

34 ? of the form ([(fee) 7 ], tf , h) 

35 insert [halt, A u (e, 7, i/, h), 0, /i) in Final 



Figure 8: CFA2 workset algorithm 

The call (f e k) in the body of app is a tail call, so no continuation is born there. Upon 
return from the first call to id, we must be careful to pass the result to Ai. Also, we must 
restore the environment of the first call to app, not the environment of the tail call. Similar- 
ly, the second call to id must return to A2 and restore the correct environment. We achieve 
these by recording a "cross-procedure" summary from the entry of app to call site (k x), 
which is the exit of id. This transitive nature of summaries is essential for tail recursion. 

5.3. Summarization for CFA2. The algorithm for CFA2 is shown in Fig. [8j It is a search- 
based summarization for higher-order programs with tail calls. Its goal is to compute which 
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local states are reachable from the initial state of a program through paths that respect 
call/return matching. 

Overview of the algorithm's structure. The algorithm uses a workset W, which con- 
tains path edges and summaries to be examined. An edge (fi, <a) is an ordered pair of local 
states. We call <fi the source and £2 the target of the edge. At every iteration, we remove 
an edge from W and process it, potentially adding new edges in W. We stop when W is 
empty. 

The algorithm maintains several sets. The results of the analysis are stored in the set 
Seen. It contains path edges (from a procedure entry to a state in the same procedure) 

and summary edges (from an entry to a CEval exit, not necessarily in the same procedure). 
The target of an edge in Seen is reachable from the source and from the initial state (c/. 



theorem 5.3). Summaries are also stored in Summary. Final records final states, i.e., 

CApplys that call halt with a return value for the whole program. Callers contains triples 
(<a, ?2, ft), where <fi is an entry, <f2 is a call in the same procedure and £3 is the entry of the 
callee. TCallers contains triples (<a, <?2, ft), where <fi is an entry, & is a tail call in the same 
procedure and ^3 is the entry of the callee. The initial state X(pr) is defined as \X{pr)\ c i. 
The helper function succ{q) returns the successor(s) of <f according to the local semantics. 

Edge processing. Each edge (<fi,ft) is processed in one of four ways, depending on If 
<?2 is an entry, a return or an inner state (line 6), then its successor 53 is a state in the same 
procedure. Since ?2 is reachable from <fi, 53 is also reachable from q±. If we have not already 
recorded the edge (<fi,ft), we do it now (line 25). 

If ?2 is a call (line 8) then ^3 is the entry of the callee, so we propagate (ft, ft) instead 
of (<fi,ft) (line 10). Also, we record the call in Callers. If an exit ^4 is reachable from £3, it 
should return to the continuation born at q (line 12). The function Update is responsible 
for computing the return state. We find the return value d by evaluating the expression 
passed to the continuation (lines 29-30). Since we are returning to A 72 , we must restore the 
environment of its creation which is tf 2 (possibly with stack filtering, line 31). The new 
state <f is the corresponding return of so we propagate (<fi,<f) (lines 32-33). 

If ^2 is a CEval exit and <fi is the initial state (lines 14-15), then ?2's successor is a final 
state (lines 34-35). If <fi is some other entry, we record the edge in Summary and pass the 
result of ?2 to the callers of <fi (lines 17-18). Last, consider the case of a tail call q to <fi 
(line 19). No continuation is born at q. Thus, we must find where ^3 (the entry that led to 
the tail call) was called from. Then again, all calls to ^3 may be tail calls, in which case we 
keep searching further back in the call chain to find a return point. We do the backward 
search by transitively adding a cross-procedure summary from ^3 to ?2 (line 25). 

If <?2 is a tail call (line 20), we find its successors and record the call in TCallers (lines 
21-23). If a successor of <?2 goes to an exit, we propagate a cross-procedure summary 
transitively (line 24). Figure [9] shows a complete run of the algorithm for a small program. 



5.4. Correctness of the workset algorithm. The local state space is finite, so there is a 
finite number of path and summary edges. We record edges as seen when we insert them in 
W, which ensures that no edge is inserted in W twice. Therefore, the algorithm terminates. 

We obviously cannot visit an infinite number of abstract states. To establish the sound- 
ness of our analysis, we show that if a state <f is reachable from I(pr), then the algorithm 
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Name 


Kind 


Value 


l(pr) 


Entry 


([(A 2 (id h)(id 1 (A 3 (u)(id 2 h))))],{[(Ai(x k) (k x))]},0) 


ft 


Call 


([(id 1 (A 3 (u)(id 2 h) ) )], [id i-> {AJ], 0) 


ft 


Entry 


(Ai,! 1 !,®) 


ft 


Exit CEval 


([(k x)],[x^{l}],0) 


ft 


CApply 


(A3,{l},[idH>{Ai}],0) 


ft 


Exit tail call 


([(id 2 h)],[idH>{Ai}][uH>{l}],0) 


ft 


Entry 


(Ai,{2},0) 


ft 


Exit CEval 


([(k x)],[x^{2}],0) 


ft 


CApply (final state) 


(Aatt,{2},0,0) 





Summary 


Callers 


TCallers 


Final 


(l(pr),l(pr)) 














P(pr),ft) 














(ft, ft) 





(X(pr),?i,ft) 








(ft, ft) 





(I(jjr),?i,? 2 ) 








(X(pr),a) 


(ft, ft) 


(I(pr),?i,f 2 ) 








(X(j)r),? 5 ) 


(ft, ft) 


(X(pr),?i,f 2 ) 








(ft, ft) 


(ft, ft) 


(I(pr),?i,f 2 ) 


(i(pr),? 5 ,? 6 ) 





(ft, ft) 


(ft, ft) 


(X(pr),?i,? 2 ) 


(^(^),ft,ft) 







(ft, ft), (ft, ft) 


(T(pr),fi,f 2 ) 


(T(pr),? 5 ,? 6 ) 








(ft, ft), (ft, ft) 


(X(pr),?i,ft) 


(T(pr),c 5 ,c 6 ) 


ft 



Figure 9: A complete run of CFA2. Note that Ai is applied twice and returns to the correct 
context both times. The program evaluates to 2. For brevity, we first show all 
reachable states and then refer to them by their names. X(pr) shows the whole 
program; in the other states we abbreviate lambdas by their labels. All heaps are 
because there are no heap variables. The rows of the table show the contents 
of the sets at line 3 for each iteration. Seen contains all pairs entered in W. 



visits \s\ai (cf. theorem 5.3). For instance, CFA2 on len tells us that we reach program 
point 5, not that we reach 5 with a stack of size 1, 2, 3, etc. 

Soundness guarantees that CFA2 does not miss any flows, but it may also add flows 
that do not happen in the abstract semantics. For example, a sound but useless algorithm 
would add all pairs of local states in Seen. We establish the completeness of CFA2 by 
proving that every visited edge corresponds to an abstract flow (cf. theorem 5.4), which 
means that there is no loss in precision when going from abstract to local states. 

The theorems use two definitions. The first associates a state <f with its corresponding 
entry, i.e., the entry of the procedure that contains <f. The second finds all entries that 
reach CE p (^) through tail calls. We include the proofs of the theorems in the appendix. 
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Definition 5.1. The Corresponding Entry CE p {q) of a state <f in a path p is: 

• <f, if q is an Entry 

• <fi, if is not an Entry, £2 is not an Exit-CEval, p = p\ ~> <fi ~>* £2 ~> £ ~> P2, and 
CF P (? 2 ) = ft 

• <fi, if <f is not an Entry, p = p\ ~> £1 <a ~> <?3 ^ + <?4 ~» <f ~> P2 , <?2 is a Call and £4 is 
an Exit-CEval, CE P (&) = ft, and <f 3 G CF*^) 

Definition 5.2. For a state <f and a path p, CE*(q) is the smallest set such that: 

. ce p {s) g cf;«) 

• CE*(<fi) C CF*(<f), when p = pi ~> <fi ~» <f 2 ^* <? ^ P2, ft is a Tail Call, £ 2 is an Entry, 
and q 2 = CE p (q) 

Theorem 5.3 (Soundness). If p = I(pr) <f then, after summarization: 

• if q is not a final state then (| CE p (q)\ a i, \q\ a i) G Seen 

• if q is a final state then \q\ a i G finai 

• if s is an Exit-CEval and <f'G CE*(q) then (\q'\ a i, |<f| a /)G Seen 
Theorem 5.4 (Completeness). After summarization: 

• For eac/j (<fi, <f 2 ) in Seen, i/tere exist q±, £2 a^d p smc/i taai p = Z(pr) £1 ~>* ^2 a^d 
ft = IftU and ? 2 = l&U ft G CE*(q 2 ) 

• For eac/i <f in Final, there exist <f and p sac/i taat p = I(pr) £ and <f = |<f| a / and <f 
is a _/inaZ state. 



5.5. Complexity. A simple calculation shows that CFA2 is in exptime. The size of the 
domain of Heap is n and the size of the range is 2 n , so there are 2 n heaps. Similarly, there 
are 2 n frames. The size of State is dominated by the size of C Apply which is n ■ 2 n ■ 2 n ■ 2 n = 
0( n -2 2n +n ). The size of Seen is the product of the sizes of U Apply and State, which is 
(n • 2™ • 2™ 2 ) • (n • 2 2 ™ 2+n ) = 0(n 2 • 2 3 ™ 2+2ri ). 

The running time of the algorithm is bounded by the number of edges in W times 
the cost of each iteration. W contains edges from Seen only, so its size is 0(n 2 • 2 3n +2n ). 
The most expensive iteration happens when line 19 is executed. There are 0(n 3 • 2 4 ™ 2+2ra ) 
TCallers and for each one we call Propagate, which involves searching Seen. Therefore, 
the loop costs 0(n 3 • 2 4 ™ 2+2n ) • 0(n 2 • 2 3 ™ 2+2n ) = 0(n 5 ■ 2 7n2+4n ). Thus, the total cost of the 
algorithm is 0(n 2 • 2 3 ™ 2 + 2 ™) • 0{n 5 • 2 7n2+4n ) = 0{n 7 • 2 10 " 2 + 6 ™). 

Showing that CFA2 is in exptime does not guarantee the existence of a program that, 
when analyzed, triggers the exponential behavior. Is there a such a program? The answer 
is yes. Consider the following program, suggested to us by Danny Dube: 

(let* ((merger (Ai(f) (A 2 (_) f))) 

(_ (merger (A 3 (x) x))) 

(clos (merger (A4(y) y))) 

(fl (clos _)l) 

(_ (fl _)i') 

(f2 (clos _) 2 ) 

(_ (f2 J 2 



(fn (clos _)„) 

(_ (fn _)„/)) 



_ ) 
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The idea is to create an exponential number of frames by exploiting the strong updates 
CFA2 does on the top frame. The code is in direct style for brevity; the let-bound variables 
would be bound by continuation lambdas in the equivalent CPS program. The only heap 
reference appears in the body of A2. We use underscores for unimportant expressions. 

The merger takes a function, binds f to it and returns a closure that ignores its argu- 
ment and returns f. We call the merger twice so that f is bound to {A3,A4} in the heap. 
Now clos is bound to A2 in the top frame and every call to clos returns {A3,A4}. Thus, 
after call site 1 the variable f 1 is bound to {A3, A4}. At 1', execution splits in two branches. 
One calls A3 and filters the binding of f 1 in the top frame to {A3}. The other calls A4 and 
filters the binding to {A4}. Each branch will split in two more branches at call 2', etc. By 
binding each f i to a set of two elements and applying it immediately, we force a strong 
update and create exponentially many frames. 

Even though strong update can be subverted, it can also speed up the analysis of some 



programs by avoiding spurious flows. In compose- same (cf. sec. 3.2), if two lambdas Ai and 
A2 flow to f , OCFA will apply each lambda at each call site, resulting in four flows. CFA2 
will only examine two flows, one that uses Ai in both call sites and one that uses A2. 

We tried to keep the algorithm of Fig. [8] simple because it is meant to be a model. 
There are many parameters one can tune to improve the performance and/or asymptotic 
complexity of CFA2: 

• no stack filtering: CFA2 is sound without stack filtering, but less precise. Permitting 
fake rebinding may not be too harmful in practice. Suppose that a set {Ai, A2} flows to 
a variable v with two stack references vi and vy. Even with stack filtering, both lambdas 
will flow to both references. Stack filtering just prevents us from using Ai at vi and A2 at 
vy along the same execution path. 

• heap widening: implementations of flow analyses rarely use one heap per state. They use 



a global heap instead and states carry timestamps 26, ch. 5]. Heap is a lattice of height 
0(n 2 ). Since the global heap grows monotonically, it can change at most 0(n 2 ) times 
during the analysis. 

• summary reuse: we can avoid some reanalyzing of procedures by creating general sum- 
maries that many callers can use. One option is to create more approximate summaries by 
widening. Another option is to include only relevant parts of the state in the summary [4J. 

• representation of the sets: in calculating the exponential upper bound, we pessimistically 
assumed that looking up an element in a set takes time linear in the size of the set. This 
need not be true if one uses efficient data structures to represent Seen and the other sets. 

An in-depth study of the performance and complexity of the proposed variants would in- 
crease our understanding of their relative merits. Also, we do not know if CFA2 has an 
exponential lower bound. Our evaluation, presented in the next section, shows that CFA2 
compares favorably to OCFA, a cubic algorithm. 

6. Evaluation 

We implemented CFA2, OCFA and 1CFA for the Twobit Scheme compiler [6~] and used 
them to do constant propagation and folding. In this section we report on some initial 
measurements and comparisons. 

OCFA and 1CFA use a standard workset algorithm. CFA2 uses the algorithm of sec- 



tion 5.3 To speed up the analyses, the variable environment and the heap are global. 
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Figure 10: Benchmark results 



We compared the effectiveness of the analyses on a small set of benchmarks (Fig. 10). 
We measured the number of stack and heap references in each program and the number 
of constants found by each analysis. We also recorded what goes in the workset in each 
analysis, i.e., the number of abstract states visited by OCFA and 1CFA, and the number of 
path and summary edges visited by CFA2. The running time of an abstract interpretation 
is proportional to the amount of things inserted in the workset. 

We chose programs that exhibit a variety of control-flow patterns. Len computes the 
length of a list recursively. Rev-iter reverses a list tail-recursively. Len-Y computes the 
length of a list using the Y-combinator instead of explicit recursion. Tree-count counts 
the nodes in a binary tree. Ins-sort sorts a list of numbers using insertion-sort. DFS does 
depth-first search of a graph. Flatten turns arbitrarily nested lists into a flat list. Sets 
defines the basic set operations and tests De Morgan's laws on sets of numbers. Church-nums 
tests distributivity of multiplication over addition for a few Church numerals. 

CFA2 finds the most constants, followed by 1CFA. OCFA is the least precise. CFA2 is 
also more efficient at exploring its abstract state space. In five out of nine cases, it visits 
fewer paths than OCFA does states. The visited set of CFA2 can be up to 3.2 times smaller 
(flatten), and up to 1.3 times larger (DFS) than the visited set of OCFA. 1CFA is less 
efficient than both OCFA (9/9 cases) and CFA2 (8/9 cases). The visited set of 1CFA can be 
significantly larger than that of CFA2 in some cases (15.6 times in tree-count, 14.4 times 
in flatten, 12.8 times in sets). 

Naturally, the number of stack references in a program is much higher than the number 
of heap references; most of the time, a variable is referenced only by the lambda that binds 
it. Thus, CFA2 uses the precise stack lookups more often than the imprecise heap lookups. 



7. Related work 



We were particularly influenced by Chaudhuri's paper on subcubic algorithms for recursive 
state machines |5j. His clear and intuitive description of summarization helped us realize 
that we can use this technique to explore the state space of CFA2. 

Our workset algorithm is based on Sharir and Pnueli's functional approach |25[ pg. 
207] and the tabulation algorithm of Reps et al. 23 , extended for tail recursion and higher- 



order functions. In section 5.2 we mentioned that CFA2 essentially produces a pushdown 
system. Then, the reader may wonder why we designed a new algorithm instead of using an 
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existing one like post* [3}[l0]. The reason is that callers cannot be identified syntactically 
in higher-order languages. Hence, algorithms that analyze higher-order programs must be 
based on search. The tabulation algorithm can be changed to use search fairly naturally. 
It is less clear to us how to do that for post*. In a way, CFA2 creates a pushdown system 
and analyzes it at the same time, much like what &-CFA does with control-flow graphs. 



Melski and Reps [171 reduced Heintze's set-constraints 13 to an instance of context- 
free-language (abbrev. CFL) reachability, which they solve using summarization. Therefore, 
their solution has the same precision as OCFA. 

CFL reachability has also been used for points-to analysis of imperative higher-order 
languages. For instance, Sridharan and Bodfk's points-to analysis for Java |28| uses CFL 
reachability to match writes and reads to object fields. Precise call/return matching is 
achieved only for programs without recursive methods. Hind's survey [l4] discusses many 
other variants of points-to analysis. 

Debray and Proebsting [7] used ideas from parsing theory to design an interprocedural 
analysis for first-order programs with tail calls. They describe control-flow with a context- 
free grammar. Then, the FOLLOW set of a procedure represents its possible return points. 
Our approach is quite different on the surface, but similar in spirit; we handle tail calls by 
computing summaries transitively. 

Mossin [21] created a type-based flow analysis for functional languages, which uses 
polymorphic subtyping for polyvariance. The input to the analysis is a program p in the 
simply-typed A-calculus with recursion. First, the analysis annotates the types in p with 
labels. Then, it computes flow information by assigning labeled types to each expression in 
p. Thus, flow analysis is reduced to a type-inference problem. The annotated type system 
uses let-polymorphism. As a result, it can distinguish flows to different references of let- 
and letrec-bound variables. In the following program, it finds that n2 is a constant. 

(let* ((id (A(x) x)) 
(nl (id 1)) 
(n2 (id 2))) 
(+ nl n2)) 

However, the type system merges flows to different references of A-bound variables. For 



instance, it cannot find that n2 is a constant in the app example of section 3.1 Mossin's 
algorithm runs in time 0(n 8 ). 



Rehof and Fahndrich [9j|22] used CFL reachability in an analysis that runs in cubic 
time and has the same precision as Mossin's. They also extended the analysis to handle 
polymorphism in the target language. Around the same time, Gustavsson and Svenningsson 
|12| formulated a cubic version of Mossin's analysis without using CFL reachability. Their 
work does not deal with polymorphism in the target language. 

Midtgaard and Jensen 18 created a flow analysis for direct-style higher-order programs 
that keeps track of "return flow" . They point out that continuations make return-point in- 
formation explicit in CPS and show how to recover this information in direct-style programs. 
Their work does not address the issue of unbounded call/return matching. 

Earl et al. followed up on CFA2 with a pushdown analysis that does not use frames [8j . 
Rather, it allocates all bindings in the heap with context, in the style of /c-CFA [26| . For 
k = 0, their analysis runs in time 0(n 6 ), where n is the size of the program. Like all 
pushdown- reachability algorithms, Earl et al.'s analysis records pairs of states (ft, 52) where 
^2 is same-context reachable from ft. However, their algorithm does not classify states as 
entries, exits, calls, etc. This has two drawbacks compared to the tabulation algorithm. 
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First, they do not distinguish between path and summary edges. Thus, they have to search 
the whole set of edges when they look for return points, even though only summaries can 
contribute to the search. More importantly, path edges are only a small subset of the set S of 
all edges between same-context reachable states. By not classifying states, their algorithm 
maintains the whole set S, not just the path edges. In other words, it records edges whose 
source is not an entry. In the graph of len, some of these edges are (6,8), (6, 13), (7, 11). 
Such edges slow down the analysis and do not contribute to call/return matching, because 
they cannot evolve into summary edges. 

In CFA2, it is possible to disable the use of frames by classifying each reference as 
a heap reference. The resulting analysis has similar precision to Earl et a/.'s analysis for 
k = 0. We conjecture that this variant is not a viable alternative in practice, because of the 
significant loss in precision. 

Might and Shivers [20] proposed TCFA (abstract garbage collection) and /iCFA (ab- 
stract counting) to increase the precision of A;-CFA. TCFA removes unreachable bindings 
from the variable environment, and //CFA counts how many times a variable is bound dur- 
ing the analysis. The two techniques combined reduce the number of spurious flows and give 
precise environment information. Stack references in CFA2 have a similar effect, because 
different calls to the same function use different frames. However, we can utilize TCFA and 
/xCFA to improve precision in the heap. 

Recently, Kobayashi |15| proposed a way to statically verify properties of typed higher- 
order programs using model-checking. He models a program by a higher-order recursion 
scheme Q, expresses the property of interest in the modal //-calculus and checks if the infinite 
tree generated by Q satisfies the property. This technique can do flow analysis, since flow 
analysis can be encoded as a model-checking problem. The target language of this work 
is the simply- typed lambda calculus. Programs in a Turing-complete language must be 
approximated in the simply-typed lambda calculus in order to be model-checked. 



8. Conclusions 

In this paper we propose CFA2, a pushdown model of higher-order programs, and prove 
it correct. CFA2 provides precise call/return matching and has a better approach to vari- 
able binding than fc-CFA. Our evaluation shows that CFA2 gives more accurate data-flow 
information than OCFA and 1CFA. 

Stack lookups make CFA2 polyvariant because different calls to the same function are 
analyzed in different environments. We did not add polyvariance in the heap to keep the 
presentation simple. Heap polyvariance is orthogonal to call/return matching; integrating 
existing techniques [1,26,31 in CFA2 should raise no difficulties. For example, CFA2 can 



be extended with call-strings polyvariance, like &-CFA, to produce a family of analyses 
CFA2.0, CFA2.1 and so on. Then, any instance of CFA2.A; would be strictly more precise 
than the corresponding instance of fc-CFA. 

We believe that pushdown models are a better tool for higher-order flow analysis than 
control-flow graphs, and are working on providing more empirical support to this thesis. 
We plan to use CFA2 for environment analysis and stack-related optimizations. We also 
plan to add support for call/cc in CFA2. 
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Appendix A. 

We use the notation 7Tj((ei, . . . , e n )) to retrieve the i th element of a tuple (ei, . . . , e n ). Also, 
we write C(g) to get the label of a term g. 

In section [TJ we mentioned that labels in a program can be split into disjoint sets 
according to the innermost user lambda that contains them. The "label to label" map 
LL(ip) returns the labels that are in the same set as ip. For example, in the program (Ai (x 
kl) (kl (A 2 (y k2) (x y (A 3 (u) (x u k2) 4 ) ) 5 ) ) 6 ) , these sets are {1, 6} and {2, 3, 4, 5}, 
so we know LL(4) = {2,3,4,5} and LL(6) = {1,6}. 

Definition A.l. For every term g, the map BV(g) returns the variables bound by lambdas 
which are subterms of g. The map has a simple inductive definition: 
BV{{{\^{v 1 ...v n )call)]) = { Vl ,...,v n }UBV(call) 
BVdigt . . .g n ^}) = BV(gi) U • • • U BV(g n ) 

BV(v) = % □ 

We assume that CFA2 works on an alphatized program, i.e., a program where all 
variables have distinct names. Thus, if (A,/, (f i • • • v n ) call) is a term in such a program, we 
know that no other lambda in that program binds variables with names v±, . . . ,v n . (During 
execution of CFA2, we do not rename any variables.) The following lemma is a simple 
consequence of alphatization. 

Lemma A. 2. A concrete state s has the form (. . . , ve, t). 

(1) For any closure (lam,f3) £ range(we), it holds that dom(/3) n BV(lam) = 0. 

(2) If \ is an Eval with call site call and environment (3, then dom(/3) n BV(call) = 0. 

(3) If s is an Apply, for any closure (lam, (3) in operator or argument position, then 
dom(/3) n BV(lam) = 0. 

Proof. We show that the lemma holds for the initial state I(pr). Then, for each transition 
<j — > q' , we assume that q satisfies the lemma and show that also satisfies it. 

• X(pr) is a U 'Apply of the form ((pr,0), (lam, 0), halt, 0, ()). Since ve is empty, (1) trivially 
holds. Also, both closures have an empty environment so (3) holds. 

• The [UEA] transition is: 

([(/e <?)'], ve,t) — > (proc,d,c, ve,l :: t) 
proc = A(f, ft, ve) 
d = A(e, (3, ve) 
c = A(q, (3, ve) 

The ve doesn't change in the transition, so (1) holds for 

The operator is a closure of the form (lam, (3'). We must show that dom(/3') D BV (lam) = 
0. If Lam?(f), then lam = f and (3' = (3. Also, we know 
dom((3)nBV(l(feq) l j) = Q) 

dom((3) n (BV(f) U BV(e) U BV(q)) = 

dom(l3)DBV(f) = 0. 

If Var?(f), then (lam, f3') £ range(?;e), so we get the desired result because ve satisfies 
(!)■ 

Similarly for d and c. 

• The [UAE] transition is: 

(proc, d, c, ve,t) — > (call, (3' , ve' , t) 
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proc = ([(Aj(ufc) call)}, (3) 

p> = p[ u ^t][k^t] 

ve' = ve[(u, t) i->- d] [(k, t) i-> c] 

To show (1) for we', it suffices to show that d and c don't violate the property. The user ar- 
gument d is of the form {lam\, Since <j satisfies (3), we know dom{f3i)f\BV {lam\) = 0, 
which is the desired result. Similarly for c. 

Also, we must show that ?' satisfies (2). We know {u, k}nBV(call) = because the pro- 
gram is alphatized. Also, from property (3) for q we know dom((3)nBV ({(Xi (u k) call)}) = 
0, which implies dom(/3) n BV(call) = 0. We must show 
dom(/3') n BV(call) = 
& (dom(/3) U {u, k}) n BV(call) = 

(dom(/3) n BV(call)) U ({u, fc} n BV(call)) = 
0U0 = 0. 

• Similarly for the other two transitions. □ 

Theorem A. 3 (Simulation). If <; — > q' and |?| ca C <f, i/ten i/tere exists <f' such that <f ~» <f 
and l^'lco C 

Proof. By cases on the concrete transition, 
a) Rule [UEA] 

([(/eg)'],/?, ve,t) ->• (proc,d,c, ve,l :: i) 
proc = „4(/ , /3, we) 
d = „4(e, /3, we) 
c = .A(g, /3, ve) 

Let is = toStack(LV(l), (3,ve). Since |^| ca C <f, <f is of the form ([(/eg)'], st, h), where 
\ve\ca E and is C si. 

The abstract transition is 
(Kfeq) l lst,h)^(f,d,c, st',h) 
f G A u (f,l,st,h) 
d = «4«(e, Z, si, /i) 
c = si) 

pop(st) Var?(q) 
st' = 1st Lam?(q) A (H?(l, f) V Lam?(f)) 

st[f^{f}] Lam ? (q)AS ? (l,f) 

State <f has many possible successors, one for each lambda in A u {f,l,st,h). We must 
show that one of them is a state <f such that |?'| ca Q ■ 

The variable environment and the heap don't change in the transitions, so for and <f' 
we know that |ue| co C Zt. We must show 7Ti(proc) = /', |o!| ca C d, |c| ca C c and is' C si', 
where is' is the stack of \s'\ C a- 
We first show iri(proc) = /', by cases on /: 
• Lam?(f) 

Then, proc = (/,£) and /' G {/}, so /' = /. 
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• S 7 (l,f) 



Then, proc = ve(f, (3(f)), a closure of the form (lam, j3'). Since ts(f) = \ve(f, (3(f))\ ca = 
{lam} and ts C st, we get lam S st(f). So, we pick /' to be lam. 



Then, proc = ve(f, /3(f)), a closure of the form (lam,/3'). Since \ve\ ca Q h and lam G 
\ v &\ca(f), we get lam E h(f). So, we pick /' to be lam. 



We now show |c| ca C c, by cases on q: 

• Lam?(q) 

Then, c = (c/,/3) and c = so |c| ca C c. 

• Var?(g) and c = ve(q,j3(q)) = ZiaZt 

Then, ts(o/) = /mZt. Since ts C st, we get st(g) = /mZt. Thus, c = /za/t. 

• Var?((7) and c= ve(q,(3(q)) = (lam, f3') 
Similar to the previous case. 

It remains to show that ts' C st'. We proceed by cases on q and /: 

• Var?(q) and c = ve(q,f3(q)) = ZiaZt 

Then, ts' = (). By ts C st, we know that ts and st have the same size. Also, 
st' = pop(st), thus st' = (). Therefore, ts' C st'. 

• Var?(q) and c= ve(q,(3(q)) = (lam, [3') 

By Fig. [5j we know that ts' = to Stack (LV (C(lam)), f3' ,ve) = pop(ts). Also, st' = 
pop(st). Thus, to show ts' C st' it suffices to show pop(ts) C pop(st), which holds 
because ts st. 



• Lam?(q) A (Lam 7 (f) V #?(/, /)) 
Then, ts' = ts and st' = st, so ts' C st'. 

• Lam-?(q) A S?(Z, /) 

By Ly(£(g)) = we get that ts' = ts. Also, proc = ve(f , (3(f)), a closure of the 

form (lam, (3'). We pick /' to be lam. The stack of <f is st' = st[f 1— > {lam}]. Since 
pop(ts) C pop(st), we only need to show that the top frames of ts' and st' are in C. For 
this, it suffices to show that ts'(f) C st'(f) which holds because ts'(f) = ts(f) = {lam}. 

b) Rule [UAE] 

(proc, d, c, ve,t) — > (call, (3' , ve' , t) 
proc= {{(Xiiuk) call)}, (3) 
(3' = (3[u I—)" t] [k 1— > t] 
f e' = ue[(u, t) i-)- d] [(&, t) H> c] 



Showing \d\ ca C d is similar. 
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Let ts' be the stack of |?'| ca . The innermost user lambda that contains call is A;, therefore 
ts' = toStack(LV (I) , j3' , ve'). We must show that |?'| ca E i-e., ts' E st' and |ue'| ca E 

We assume that c = {lam, Pi) and that H?(u) holds, the other cases are simpler. In 
this case, |we'| ca is the same as \ve\ ca except that \ve'\ ca (u) = \ve\ ca (u) U \d\ ca . Also, 
h'(u) = h(u) U d, thus |we'| ca E h' . 

We know that j3' contains bindings for u and k, and by lemma A. 2 it doesn't bind any 
variables in BV(call). Since LV(l) \ {u, k} = BV(call), /?' doesn't bind any variables in 
LV(l) \ {u, k}. Thus, the top frame of ts' is [u h-» |d| ca ][/c H > \c\ ca \. The top frame of st' 
is [u i — y d][k i — ^ c], therefore the frames are in E- To complete the proof of ts' E st', we 
must show that pop(ts') E pop(st') 
43- pop(ts') E si 
•£= pop(ts') = ts. 

We know pop(ts') = toStack(LV(C(lam)), (3i,ve'), ts = toStack(LV(C(lam)), fii, ve). By 
the temporal consistency of states (cf. [l9] definition 4.4.5), pop(ts') won't contain the 
two bindings born at time t because they are younger than all bindings in f3\ . This implies 
that pop(ts') = ts. 



c) Rule [CEA] 

([(<?e) 7 ],/3,we,t) -> (proc,d, ve,-/ :: t) 
proc = A(q, f3, ve) 
d = A(e, (3, ve) 

Let ts = toStack(LV(-y), f3,ve). Since |?| ca E <f, ? is of the form ({(q e) 1 }, st, h), where 
|^e| ca E h and ts E st. The abstract transition is 

(l(qer],st,h)^(q',d,st',h) 

q' = Ak(q, st) 

d = A u (e,-f, st, h) 

st > = \p°P( st ) Var "?(q) 
}st Lam?(q) 

Let ts' be the stack of \s'\ ca - We must show that \<;'\ ca E i-e., \proc\ ca = q' , \d\ ca E d, 
and ts' E st'. 

We first show |proc| ca = q', by cases on q: 

• Lam?(q) 

Then, proc = (q,(3) and q' = q. Thus, |proc| ca = q'. 

• Var-?(q) and proc = ve(q,(3(q)) = {lam, Pi) 

Since q G -^(7) we get £s(g) = /am. From the latter and ts E st, we get s£(g) = lam, 
which implies q' = lam, which implies |proc| ca = q' . 

• Var?{q) and proc = ve{q, (3{q)) = halt 
Similar to the previous case. 

Showing \d\ ca E d is similar, by cases on e. 
Last, we show ts' E st' , by cases on q: 

• Lam?(q) 

Then, st' = st. Also, ts' = toStack(LV(£(q)), (3, ve) and LV{C(q)) = LV{i). Thus, 
ts' = ts, which implies ts' E st' . 
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• Var-?(q) and proc = ve(q,f3(q)) = (lam, Pi) 

Then, ts' = to Stack (LV(C(lam)), Pi,ve) = pop(ts) and st' = pop(st). To show ts' C 
st' , it suffices to show pop(ts) C pop(st), which holds by ts Q st. 

• Var?(q) and proc = ve(q,(3(q)) = halt 
Similar to the previous case. 

d) Rule [CAE] 

This case requires arguments similar to the previous cases. Ll 

Lemma A. 4. On an Eval-to- Apply transition, the stack below the top frame is irrelevant. 
Formally, 

• U (\(f e lam) l },tf :: st,h) ~» (ulam,d,lam,tf' :: st,h) then for any st' , 
([C/e lam) 1 }, tf :: st', h) ^ (ulam,d, lam, tf :: st',h) 

• If (I (f e k) 1 }, tf :: st,h) ~> (ulam,d,c,st,h) then for any st' , 
{\(fek)\tf :: st',h)^ (ulam,d,c, st',h) 

• Similarly for rule [CEA] . □ 

Lemma A. 5. On an Apply-to-Eval transition, the stack is irrelevant. Formally, 

• If ([ CAz (u k) call)}, d, c, st, h) (call, [u \— > d] [k \— > c] :: st, h') then for any st' , 
(\(\i (uk) call)\,d, c, st' , h) ~> (call, [u i — >- d] [A; i — >- c] :: st' , h') 

• Similarly for rule [CAE], where st' is any non-empty stack. □ 

Definition A. 6 (Push Monotonicity). 

Let p = q e £ where q e is an entry with stack st e . The path p is push monotonic iff 
every transition £i ~> ^ 2 satisfies the following property: 

If the stack of is st e then the transition can only push the stack, it cannot 

pop or modify the top frame. 

□ 

Push monotonicity is a property of paths, not of individual transitions. A push monotonic 
path can contain transitions that pop, as long as the stack never shrinks below the stack 
of the initial state of the path. The following properties are simple consequences of push 
monotonicity. 

Property A. 7. The stack of the first state in a push-monotonic path is a suffix of the stack 
of every other state in the path. 

Property A. 8. In a push-monotonic path, the number of pushes is greater than or equal 
to the number of pops. 

The following lemma associates entries with "same-level reachable" states. A state q is 
same-level reachable from an entry £ e if it is in the procedure whose entry is q e or if it is 
in some procedure that can be reached from q e through tail calls, i.e., without growing the 
stack. 

Lemma A. 9 (Same- level reachability). 

Let <f e = (\(\i (u k) call)}, d, c, st e , h e ), <f =(..., st, h), and p = q e <f where <f e G CE* p (q). 
Then, 

(1) If q is an entry, st = st e . 
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(2) If <f is not an entry, 

(a) st is of the form tf :: st e , for some frame tf . 

(b) there exists k! such that tf(k') = c. 

(c) i/4 = CE p (q) then dom(t/) C LV(l), tf(u) C d and tf(k) = c. 

Moreover, if q is an Eval over call site ip then ip G LL{1), and if q is a CApply over 
CA 7 (u')call ) then 7 G LL{1). 

(3) p is push monotonic. 

Proof. By induction on the length \p\ of p. Note that ^ follows from the form of the stack 
m 

@ and ^, so we won't prove it separately. 
Basecase: 

If \p\ = 0, then <f = q e so st = st e . 
Inductive step: 

If \p\ > 0, there are two cases; either q e = CE p (q) or <f e ^ CE p (q). 
a) q e = CE p (q) 

Since \p\ > 0, q is not an entry, so the second or the third branch of the definition of CE P 
determine the shape of p. 
al) p = q e q' ~> q 

Here, the predecessor <f' of <f is not a CEval exit, and ? e = CE p (q'). We proceed by 

cases on <f'. Note that <f cannot be a UEval because then q is an entry, so q = CE p (q), 
and our assumption that q e = CE p (q) breaks. 

al.l) q' is an inner CEval 

Then, <? = ([((A 7 (u')ca//') e') 7 '], st', /i')- B Y ^ = :: st e, 
dom(i/') C LV(Z), t/ / ( , u) C d, = c and 7' G LL(l). By the abstract semantics, 

£ = ([(A 7 (ii')caZZ / )l,d / ,si / ,/i / ) where d' = A u (e' ,i , st' ,ti). We know that 7 G 
because 7' G LL(l). Also, the stack is unchanged in the transition. Thus, (2a), ( |2b| ) 
and (2c) hold for <f. 

al.2) <f is a CApply 

Then, ? = ([(A 7 (n')ca// , )],d , ,si , ,/i / )- By Iff, si' = tf :: si e , dom(tf') C LV(l), 
tf'{u) C d, = c and 7 G LL(l). 

By the abstract semantics, q = (call' , st, h) where st = st'[u' 1— > d']. 



So, st = tf :: st e which satisfies (2a). Also, tf = tf'[u' h-> d] where u' G £V(Z) 
because 7 G LL(l), and u' ^ u because the program is a-tized. Thus, dom(tf) = 
dom{tf) U {u'} C LV(l), and = tf'(u) C d, and t/(lfe) = i/'(ife) = c. Last, the 

label of ca//' is in LL(l) because 7 G LL(l). 

al.3) <f is a U Apply 

Then, q' = <f e because q e = CE p (q'). This case is simple. 

a2) p = q e ^+ <f 2 ~» ?3 ^ + <f' ^ <? 

Here, the third branch of the definition of CE p determines the shape of p, so £2 is a 

call, <f e = CE p (s2), ?' is a CEval exit and £3 G CE*(q'). 

By Iff for 4 ^ + £2 we get q 2 = ([(/a e 2 (A 72 (u 2 ) ca// 2 ) )' 2 ], s*2, ^2), where s£ 2 = tf 2 :: 
st e , dom(i/ 2 ) C LV(l), tf 2 (u) C d, # 2 (&) = c and Z 2 G Li(Z). 
By the abstract semantics for q 2 ?3 we get 
<?3 = ([(Ai 3 («3 h)call 3 )],d 3 ,c 3 ,st 3 ,h 2 ), 
where [(A; 3 (u 3 /c 3 ) call 3 )j G ^„(/ 2 , Z2, st 2 , h 2 ), 
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d3 = A u (e 2 , h, st 2 , ft 2 ), C3 = [(A 72 (u 2 ) call 2 )\ and 
either st 3 = st 2 , if (Lam?(f 2 ) V H?(l 2 ,f 2 )) holds, 
or st 3 = st 2 [f 2 ^ {l(\ l3 (u 3 k 3 )call 3 )]}], if S?(l 2 ,f 2 ) holds. 
a2.1) S ? (l 2 ,f 2 ) 

Then, st 3 = tf 2 [f 2 ^{l(X h (u 3 fc 3 ) caZZ 3 )]]}] " «* e - 

By Iff for c 3 we get = ([(ft' e') 7 'l, si', ft'), 

where si' = tf' :: s£ 3 and tf'(k') = |(A 72 (« 2 ) ca// 2 )]. 

Thus, by the abstract semantics for <f' ~> <f we get 

<f = ([(A 72 (u 2 ) ca// 2 )], d', st 3 , ft'). 

Now, 72 G IL(Z) follows from Z 2 G LL(l). 

Also, = i/ :: st e where t/ = tf 2 [f 2 i-> {[(A; 3 (u 3 fc 3 ) ca// 3 )]}]. 
Then, dom(tf) = dom(*/ 2 ) U {/ 2 } C LV(i) because S?(l 2 ,f 2 ) implies / 2 G LV(i)- 
Also, tf(k) = tf 2 (k) = c. Last, we take cases depending on whether u and f 2 are the 
same variable or not. 

• u = f 2 

tf(u) = {l(\ h (u 3 k 3 )calh)}} C A u (f 2 ,l 2 ,st 2 ,h 2 ) = st 2 (f 2 ) = tf 2 (f 2 ) = tf 2 (u) 
C d 

• f 2 

tf( u ) = tf 2 (u) C d 
a2.2) Larm(f 2 )V H ? (l 2 ,f 2 ) 

This case is simpler than the previous case because s£ 3 = si 2 . 
b) <? e / C£ p (<?) (but 4 G ce;(?)) 

Then, the second branch of the definition of CE* determines the shape of p; 
p = 4 <fi ~» <f 2 ~>* where 4 is a tail call, <f 2 = CE p (q) and 4 G CE*(^i). 

By IE for & ^+ a we get ft = ei fti)' 1 ], fti), 
where st\ = tf 1 :: s£ e , tfi(ki) = c. 

By the abstract semantics, <f 2 = ([(A; 2 (u 2 k 2 ) call 2 )}, d 2 , c, st e ,h±). 
b.l) £ is an entry 

Then, <f = <f 2 because <f 2 = CE p {q). So, si = si e . 
b.2) £ is not an entry 

By IH for <f 2 ~>* <f we get st = tf :: si e and tf(k 2 ) = c. This is the desired result for 

4 ^* 4 □ 

Lemma A. 10 (Local simulation). 

If^~~>? and succ(\^\ a i) / 0, iften |<f' | oi G succ(|<f| a ;). 

Proof. By cases on the abstract transition. 

We only show the lemma for [UEA], the other cases are similar. 

(Kfeq) l lst,h)^(f,d,c, st',h) 

f'€Au{f,l,st,h) 

d = A u (e, I, st, ft) 

c = A(<7, si) 

pop(st) Var?(q) 
st' = 1st Lam?(q) A (#?(/, /) V Lam?(f)) 

_>[/->{/'}] Lami(q) A Si(l, f) 
A UEval state has a successor only when its stack is not empty, so st = tf :: st" . 
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Thus, \st\ a i = { (v, tf(v)) : v £ dom(tf) A UVar?(v)}. 
Then, \q\ al = (l(feq) l l\st\ al ,h). Also, \q'\ al = (f',d,h). 

If suffices to show that /' G A u (f,l,\st\ a i,h) and d = A u (e,l,\st\ a i, h); but these hold be- 
cause A u (v, ip, \st\ai, h) = A u (v, ip, st, h) is true for any v {uvar or ulam). □ 

Lemma A. 11 (Converse of Local Simulation). 

If q sj> q' then, for any q such that q = \q\ a i, there exists a state q' such that q ~» q 1 and 

□ 



\?\al 



Lemma A. 12 (Path decomposition). Let p = q e q be push monotonic and q e = 
(\(\i(uk) call)},d,c, st e ,h e ). 

• if q is a C Apply of the form (c, . . . , st e , . . . ) then CE p {q) is not defined. 

• Otherwise, 

(1) CE p (q) is defined, i.e., p = q e <fi ~>* <f, where q\ = CE p (q). 

(2) Regarding the set CE*(q), p can be in one of four forms 

(a) p = q e ^* q where q e = CE p (q) and CE*(q) = {q e } 

(b) p = ei ^ + ci ~> . . . ~» efc Cfc <fi ~>* <f, k > 0, where eiS are entries, CiS are 
tail calls, e± = q e , a = CE p (a), q\ = CE p (q) and CE*(q) = {ei, . . . ,efc,<fi} 

(c) p = q e ^ + c~^> q\ <f where c is a call, q% = CE p (q) and CE*(q) = {q±} 

(d) p = q e c e% ci ~» . . . ~> ~> <fi > 0, where 
c is a call, eis are entries, are iaiZ caZZs, ej = CE p (ci), q\ = CE p {q) and 
CE * P ($) = {ei, . . • ,e fc ,<fi} 

Proof. By induction on the length of p. 
Basecase: 4 <f e 
Then, q = q e => ? e = CE p (q) 



CE* p (q) 



{q e } (2a) holds 



Inductive step: <f e <f ~> £ 
Cases on <f': 

a) <f is a Call 

Then, q is an entry so CE p (q) = q. Also, CE*(q) = {q} so (l2cj) holds. 

b) q' is a Tail Call 

Then, q is an entry so CE p (q) = q. 

To show g, we take cases on whether ( |2a[ ), ( |2b[ ), ( |2c| ) or ( |2d[ ) holds for q' . 

b.l) @ holds for q', i.e., 

p = q e q' q where q e = CE p (q') and CE*(q') = {q e }- By the second branch 
of the definition of CE* p , CE* p (q') C CEJ(^). Hence, CE* p {q) = {q e ,q}, which implies 
that @ holds for 

b.2) (pbl) holds for q' 



By a similar argument, we find that (2b) holds for q. 
b.3) @ holds for q' 



By a similar argument, we find that (2d) holds for q. 
b.4) @ holds for q' 

By a similar argument, we find that (2d) holds for q. 

c) q' is a C Apply = (c, . . . , st e , . . . ) 

Then, in the transition <f' ~> if we modify the top frame of st e 



which means that p isn't 



push monotonic. Thus, this case can't arise. 
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d) <f is an inner CEval or a C Apply ^ (c, . . . , st e , . . . ) 

By /if, p = l(pr) <fi <f £ where ?i = CE p (q'). 
By the second branch of the definition of CE P , ft = CE p {q). 

To show ([2]), we take cases on whether (|2al), (pbj), (|2c|) or (|2dl) holds for <f'. The reasoning 
is the same as in case (b). 

e) <f is a CEval exit 

By 777, p = q e ^* ft <?' ~* ft where ft = CE p (q'). 
Cases on (pal), pbl, pel) or pdb for <?. 



e.l) pa[ ) holds for ft, i.e. 

p = ft <f 2f where ft = CE p (q'). 

By lemma |A.9 the stack of <f is of the form tf :: st e and = c. Thus, <f = 

(c, . . . , st e , . . . ). The only way for CE p {^) to exist is by the third branch of the definition 

of CE P , since <f is a CEval exit. But there is no call leading to ft, thus CE p {q) can't 
exist. 

Similarly when (2b) holds for 
e.2) @ holds for <f , i.e. 

p = ft ^ + c ~» ft <f ' ~> <f where c is a call and ft = CE p {q'). 

By Iff, CE p (c) exists so p can be written p = ft £2 ^ + c <f 1 ^ + <f ~> <f where 
£2 = CE p (c). Then, by the third branch of the definition of CE p , CE p (^) = CE p (c) = 
52- 

To show ([2]) for <f we work as in the previous cases, 
f) <f' is an Entry 

This case is simple. D 



A.I 



(ulam,d,c,st e ,h e ). Also, <f n is 
the stack of each ft is of the 



Lemma A. 13 (Stack irrelevance). 

Let p = <f 1 ~> <f 2 ~* • • • <?« pus/i monotonic, where ft 
no£ a C Apply of the form (c, . . . , s£ e , . . . ). i?y property 
form append(sti, st e ). 

For an arbitrary stack st' and continuation c' , consider the sequence p' of states q[ ^ • • • ?n 
where each ^ is produced by ft as follows: 

• i/ ft is an eniry with stack st e then replace the continuation argument with c' and the 
stack with st' . 

• if st e is a proper suffix of the stack o/ft then the latter has the form append (st^, (frj), st e ) 
for some stack st',. Change st e to st' and bind the continuation variable in fr i to c' . 

(Note: the map isn't total, but it should be defined for all states in p.) 
Then, 

• for any two states and ft' +1 in p' , it holds that ft' ft' +1 

• the path p' is push monotonic 

Proof. By induction on the length of p. 

The basecase is simple. 

Inductive step: p = <f 1 q n -i q n 

By IH, the transitions in the path q[ s' n _i are valid with respect to the abstract semantics 
and the path is push monotonic. We must show that (^n-i^'n) an d that <f{ ^* £' n is 
push monotonic. 
Cases on ? n _i: 
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(1) 4-1 is a UEval, of the form ([(/ eg)'], st, h) 

By lemma A. 12 CE p (q n -i) is defined and p can be in one of four forms. We consider 

only the first case, the rest are similar. 

Let p = si ^ + 4-1 ~> 4 where <fi = CS p (4-i)- 

By lemma A. 9 si is of the form £/ :: si e and the continuation variable in tf (call it k) 
is bound to c. 

(a) g is a variable 

By the abstract semantics we have that 4 is (ulam n ,d n ,c, st e ,h). Also, the state 
is ({(f eq) l },tf[k h-> c'] :: st',h), and it transitions to (ulam n ,d n ,c' , st' ,h) 
which is 4- 

(b) g is a lambda and / is a stack reference 

Then, 4 is (ulam n ,d n ,q,tf[f h-> {?iZam n }] :: s£ e ,/i). 

Also, the state 4-i is ([(/eg)'], i— >• c'] :: si', /i), and it transitions to 
(ulam n ,d n ,q, tf[k h-> c'][/ h-> {«lora„}] :: st' , /i) which is <f^. 

(c) g is a lambda and / is a heap reference 
Similarly. 

(2) 4-1 is a CEval exit 

By lemma A. 12 CE p (4-i) is defined and p can be in one of four forms. 

(a) p = <fi ^ + 4-1 ^ 4, where 4 = CE p (4-i) 

Then, by lemma |A.9| and the abstract semantics, it is easy to see that 4 is of the 
form (c, . . . , st e , . . . ). Thus, this case isn't possible. 
Similarly when 4 / CZ£ p (4-i) but is in Ci£*(4-i)- 

(b) p = 4 ^ + c ~* <fg ^ + 4-1 ~^ <? where <? e = CE , p (4-i) and c is a call: 

Then, CE p (c) is defined and its stack has si e as a suffix. Hence, by lemma A. 9 the 
stack of c is bigger than st e by at least a frame. Since the stack of q' e has the same 
size as the stack of c, the stack of 4-1 is bigger than st e by at least two frames. 
By lemma [A~4 we get the desired result. 
Similarly when q' e ^ CE p (4-i) but is in CE*(4-i)- 

(3) 4-1 is an inner CEval 
Similarly to the previous cases. 

(4) 4_i is a U Ap ply 

Lemma A. 12 gives the same four cases. We only consider one, the rest are similar. 
Let p = ifi c 4-1 ~* 4 where c is a call. 

Then, CE p (c) is defined and its stack has st e as a suffix. Hence, by lemma A. 9 the 
stack of c is bigger than st e by at least a frame. Since the stack of 4—1 has the same 
size as the stack of c, we don't change the continuation argument in By lemma 

|A.5 we get the desired result. 

(5) 4_i is a C Apply 

Similarly to the previous cases. D 



Theorem A. 14 (Soundness). 

If p = I(pr) £ then, after summarization: 

• if q is not a final state then (\CE p {q)\ a i,\s\ a i) £ Seen 

• if q is a final state then \s\ a i £ Final 
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• if ^ is a CEval exit and <f' G CE* p (q) then (|<?'| a Z; |?Uz) £ Seen 

Proof. By induction on the length of p. 
Basecase: Z(pr) Z(pr) 
Then, {X(pr),X(pr)) G Seen. 

Inductive step: T(pr) ~»* <f ' ~> £ 
Cases on <f: 

a) <f is an Entry 
Then, CE p ($) = <?. Also, <f is a call or a tail call. 

p = T(pr) ~»* <fi ^ + <f' ~> <f, where <fi = CE p (<f'). 
s'\ai) £ Seen which means that it has been entered in W and examined. 
By lemma A. 10 \^\ a i G succ(|<f'| a ;) so in line 10 or 22 (|?| a 2, |<f| a /) will be propagated. 

b) <f is a C Apply but not a final state 

Then, f= (|(A 7 (u) call)],d, st, h) and <?' = ([(g e) 7 '], st' , h). 

b.l) Lam?(q), i.e. ?' is an inner CEval 
This case is simple. 
Var?(q), i.e. <f' is a CEval exit 

?' satisfies part 2 of lemma 



By lemma 

By IH, (|a 



A.12 



a/ 1 



b.2) 



A.12 



It can't satisfy cases 
or 



2a 



or 



2b 



2d Then, the 



The path T(pr) 

because <f would be a final state by lemma A. 9 Thus, it satisfies 2c 
path is of the form p = X(pr) <fi ^ + £2 <?3 ^ + <f ~» <? 

where <a is a call, <fi = CE P (^2) and £3 G CE* p (q'). Note that by the third branch of the 
definition of CE p , <fi = CE p (q). We must show that (|<fi| a z> \s\al) £ Seen. 

The state <fi is an entry of the form <fi = ([(A/ 1 (ui fei) co/Zi)], di, ci, sti, hi) 

The sta te <?2 is a call of the form £2 = ([C/2 e 2 Q'2)' 2 ] ) ^2, /12), where 02 is a dam. 

Lemma A. 9 for <fi £2 gives st2 = ^2 :: s ^i- 

By the abstract semantics for ^2 ~* S3> we get: 

<?3 = (ulam, d3, ^2, s^3, fo), where 

either st3 = st2, if {Lam 7 (f 2 ) V H-?(l 2 , f 2 )) holds, 

or st3 = st2[f2 !->■ {uZom}], if S?(l 2 ,f2) holds. 

i.e. s^3 = tf 3 :: sti, and 

IV2 Lam ? (f 2 )VH 7 (l 2 J 2 ) 
\tf 2 [f 2 ^{ulam}} Sf(l 2 ,h) 
By lemma A. 9 for £3 <f, we get st' = f/' :: st3 and t/'(g) 



32- 



Then, by the abstract semantics for C ~* 

<?2 = [(A 7 (w) caZZ)], = s^3, and d = A u (e, 7', st', /i). 

The above information will become useful when dealing with the local counterparts of 
the aforementioned states. 

By IH, (|?3| a z, \s'\ai) was entered in W (at line 25) and later examined at line 13. Note 
that <?3 ^ I(pr) because <?2 is between them, therefore Final will not be called at line 
15. 



Also by IH, (|<?i| a 2, 192I ai) was entered in W and later examined. Lemma A. 10 implies 
that \$s\ a i G succ(\&\ai) so (\^i\ a i, |&U> \s3\al) will go in Callers. We take cases on 
whether (|4|<zZ, W\al) or (|£lUj l&lal) was examined first by the algorithm. 
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b.2.1) ( | <fi | , \^ 2 \ai) was examined first 

Then, when (|<? 3 | /, \<?\ a i) is examined, (|<a| /, \&\ai, |&U) is in Callers. 

Therefore, at line 18 we call Update (|<fi| ai , \&\al, |<?'U)- 

By applying | • | a ; to the abstract states we get 

Iftloj = ([(A^Cui fei) coZZi)], di,/ti) 

|« J U = ([C/2e 2 Q2) l2 l tf 2 ,h 2 ), 

where q 2 = |(A 7 (u) call)}. 

I <?3 Uz = (ulam,d 3 , h 2 ) 

\?\ai = (l(qey'ltf>,h), 

where tf'(g) = [(A 7 (u) call)}. 

By looking at Update's code, we see that the return value is .A«(e, 7', tf , h) = 
Au(e, 7', si', /i) = d. The frame of the return state is 

Uf 2 Lam 7 (f 2 )VH 7 (l 2 ,f 2 ) 

\tf 2 [f 2 ^{ulam}] S 7 (l 2 ,f 2 ) 

which is equal to i/ 3 . The heap at the return state is h. Last, the continuation we 
are returning to is |(A 7 (u) call)}. Thus, the return state <f is equal to \s\ a i, and we 
call Propagate (|<fi I a/ , \s\ a 0, so (|<fi| a /, |?U) will go in Seen. 
b.2.2) ( I <?3 1 a,z , \s'\ai) was examined first 

Then, when (|<?i| a z, \ s2\ai) is examined, (|<?3| a z> I?' \ai) is in Summary, and at line 12 we 
call Update(|«fi| a i, \&\ah \s3\al, K'loi)- 
Proceed as above. 

c) <f is a final state 

Then, <f = (halt, d, (), /i). We must show that \s\ a i will be in Final after the execution of 
the summarization algorithm. By the abstract semantics for <f' ~> <f , <f' = ([(A; e) 7 ], si', /i), 
where si' = i/' :: (), tf'(k) = halt, and d = «4«(e, 7, si', h). 

By iff for I(pr) ~~»* if', we know that (|X(pr)| a z, |<f| a z) was entered in W and Summary 
sometime during the algorithm. When it was examined, the test at line 14 was true so 
we called Final (|<f | a ;). Hence, we insert <f = (halt,A u (e,^,tf',h),$,h) in Final. But, 
^4 M (e,7, tf, h) = A u (e,j, si', /i) = d, hence <f = |<f|^. 

d) <f is a CEv al exit 



By lemma A. 12 for I(pr) <f', p = I(pr) ?i ^* <f' <f, where <fi = CE p (q'). But 

<f' is not a CEval exit (it is an Apply state), so by the second branch of the definition of 
CE P we get <fi = CE p (q). 

By Iff, (|<?i|aZ) I?' I a/) is entered in Seen and VF; and examined at line 6. By lemma A.10[ 
|<f| a / £ succ(\s'\ai) so (| <fi | a i , \s \ ai) will be propagated (line 7) and entered in Seen (line 
25). 

We need to show that for every <f" G CE*(s), (|<f"|aZ, \s\ai) will be inserted in Seen. The 



path I(pr) ?' satisfies part [2j of lemma A. 12 proceed by cases: 
d.l) l(pr) <f satisfies 2a 

Then, <fi = I(pr) and p = <f 1 ^* <f ~> £ and CE*(q) = {<fi}. But we 've shown that 
(\si\al, \s\al) is entered in Seen. 
d.2) T(pr) ~~»* <f satisfies 



2b 



Then, p = ei ^ + ci ~» . . . ~» ^ + ~* q\ ^* <f <f, where ei = Z(pr), e^s are 
entries, qs are tail calls, = CE p (ci), CE* p (q') = {e\, . . . , e&, <fi}. 
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Hence, CE*(q) = {ei, . . . , e^, ft}. To show that (|efc| a z, |<f| a z) is entered in Seen, we 
proceed by cases on whether (|efc| a /, |cjt| ;) or (|ft| a z, \s\ a i) was examined first by the 
algorithm. 
d.2.1) (|efe|ai, | Cfc | a / ) was examined first 



By lemma \AA0\ \^\ al is in succ(\c k \ a i), hence (\e k \ ah |c fc | /, \si\ a i) will go in TCallers. 
Then, when (|ft| a ;, \^\al) is examined, in line 19 we will call Propagate (|efc| a z, |<f| a z), 
so (\e k \ai, |<f|az) will go in Seen. 
d.2.2) (IftLz, |<fLz) was examined first 



When (|efc| a z, |cfc| a z) is examined, (|ft| a z, |<?Uz) will be in Summary, and by lemma A. 10 
we know |ft| a z £ succ(\c k \ai)- Thus, in line 24 we will call Propagate which will insert 
(\ek\ai, \s\al) in Seen. 

By repeating this process k — 1 times, we can show that all edges (|ej| a z, |<f| a z) go in 
Seen. 

d.3) I(pr) ~~** if' satisfies 



2c 



or 



2d 



These cases are similar to the previous cases. The only difference is that now 1{pr) is 
not in CE*(s') (which doesn't change the proof), 
e) f is a Tail Call ( thus an exit) 



By lemma A. 12 for X(pr) ~~*-* <f' , p = I(pr) ft ~>* <f' ~> <f, where ft = CE p {q'). But 



<f' is not a CEval exit (it is an Apply state), so by the second branch of the definition of 
CE p we get ft = CE p (q) 



By IH , (| ft \ a i, \s'\al) is entered in Seen and W; and examined at line 6. By lemma A.10[ 



|<f| a 2 € succ(|?'| a |) so (|ft|ai, \s\ai) will be propagated (line 7) and entered in .Seen (line 
25). 

f) ^ is an inner CEval 
This case is simple. 

g) <f is a Call 

This case is simple. □ 



Theorem A. 15 (Completeness). 
After summarization: 

• For each (ft, ^2) in Seen, there exist ft, €2 and p such that p = Z(pr) ft ~>* £2 and 
ft = |<fl|aZ <3 = l&U anrf ?i G CE * P (&) 

• For each <f in Final, there exist <f and p suc/i i/iai p = X{pr) ^ + <f and <f = |<f| a ; and <f is 
a /maZ state. 

Proof. By induction on the number of iterations. We prove that the algorithm maintains 
the following properties for Seen and Final. 

(1) For each (ft, ^2) m Seen, there exist ft, £2 and p such that p = T{pr) ~>* ft £2 

and ft = |ft| a ; and <f2 = |§j 1 al an d, if <?2 is a CEval exit then ft € CE*^) otherwise 
ft = CE P (&) 

(2) For each if in Final, there exist f and p such that p = I(pr) ^ + if and <f = |<f| a j and 
f is a final state. 
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Initially, we must show that the properties hold before the first iteration (at the beginning 
of the algorithm): Final is empty and W contains just (X(pr) ,X(pr)) , for which property 
1 holds. 

Now the inductive step: at the beginning of each iteration, we remove an edge (a, ft) 
from W. We assume that the properties hold at that point. We must show that, after we 
process the edge, the new elements of Seen and Final satisfy the properties. 

• ft is an entry, a C Apply or an inner CEval 
(si,ft) is in Seen, so by IH 

3 Si,ft,P- P = Z(pr) Si ft A si = |si| a / A ft = |ft|ai A = C£ p (ft) 
For each s 3 in succ(ft), (si, ft) will be propagated. 

If (si, ft) is already in Seen then property [l] holds by IH (in the following cases, we won't 
repeat this argument and will assume that the insertion in Seen happens now). 
Otherwise, we insert the edge at this iteration, at line 25. By lemma [A.ll| 

3 S3- S3 = IftU A ft ^ S3 

By the second branch of the definition of CE P , si = CE p (q 3 ) 

• ft is a call 

Let si = ([(Ai(ui ki)callO},dx,hi) and ft = (IC/2 e 2 (A 2 (u 2 ) call 2 ) ) h j, tf 2 ,h 2 ) 
Also, assume S?(Z 2 ,/ 2 ) (the other cases are simpler), 
(si, ft) is in Seen, so by IH 

3 si,ft,P- P = T(pr) si ^ + s 2 A si = |siU A ft = |ft|qj A & = CE p (q 2 ) 



A.9 



Each entry S3 in succ(s 2 ) will be propagated. By lemma A. 11 

3 S3- S3 = \&\a,l A ft S3 

Since S3 = CE p (c; 3 ), property [T] holds for S3. 

If there is no edge (s3,S4) in Summary, we are done. 

Otherwise, we call Update(si, ft, S3, Si) and we must show that property [T] holds for the 
edge inserted in Seen by Update. 

Let sti be the stack of si- By lemma A.9 the stack of ft is tf 2 :: six- 
Let S3 = ([(A3(u 3 fc3)cffl«3)],d3,/iB) and s 4 = ([(fc4e 4 ) U ],i/ 4 ,/i4)- 
(Note that tf 4 contains only user bindings.) 

We know Summary C Seen so by IH for (s3,S 4 ) we get (note that si is a CEval exit) 

3 S 3 ,S 4 ,P'- P' = AP^ ^* S3 S 4 A S3 = |s 3 U A S4 = K\al A S3 G CE%&) 

Then, S3 = ([(A 3 (« 3 k 3 ) call 3 )\, d 3) c 3 , st' 3 , h 2 ) and by lemma 
S 4 = ([(/c 4 e 4 )], i/ 4 [fe4 ^ c 3 ] :: st' 3 , /i 4 ). 
But the path from S3 to s 4 is push monotonic, so by lemma A. 13 there exist states 
S3 = (I(A 3 («3 k 3 )call 3 )],d 3 , [(A 2 (u 2 ) call 2 )\, st 3 ,h 2 ) 

where st 3 = tf 2 [f 2 i-> {[(A3 (u 3 k 3 ) caW 3 )J}] :: and s 4 = ([(fc 4 e 4 )], s£ 4) /14) 
where s£ 4 = tf 4 [ki t-t {(.\ 2 (.u 2 ) call 2 )J\ :: st 3 , such that s 3 ^ + S4- 

Thus, the path p can be extended to X(pr) ~>* si ^ + ft ~» S3 ^ + S4- By the abstract 

semantics, the successor s of s 4 is ([(A 2 (u 2 ) call 2 )J,A u (ei, Z 4 , si 4 , /i 4 ), si 3 , /14). 

The state s produced by Update is ([(A 2 (u 2 ) call 2 )\, ^4«(e 4 , Z4, t/ 4 , /i 4 ), i/, /i 4 ) where tf = 

tf2[h {[(A 3 (u3 k 3 ) call 3 )}}]. It is simple to see that s = |s| a z- 

S 2 is a CEval exit, ([(& e)' 2 ], i/ 2 , /i 2 ) 

If si is X(pr) then Final (s 2 ) is called and a local state s of the form 

(halt, A u (e, l 2 , tf 2 , h 2 ), 0, h 2 ) goes in Final. We must show that property [2] holds. 

By IH for (si, ft), 3 ft, p. p = Z(pr) ^+ ft A ft = |ftU A ±{pr) G CB;(? 2 ). 
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(Note that ft = T(pr).) By lemma A. 9, the stack st2 of £2 is tf 2 [k *— > halt] :: (). Hence, 
the successor <f of <?2 is (halt,A u (e, st2, /12), {), ^-2), and ? = |?| a ; holds. 
If ft ^l(pr), for each triple (?3,<f4,ft) in Callers, we call Update^, <f4, ft, ^2)- Insertion 
in Callers happens only at line 11, which means that (<?3,<f4) is in Seen. Thus, by IH 

3 ?3, <k,P- P = i(pr) ? 3 <?4 A ? 3 = |? 3 |ai A <f 4 = |&| oj A ? 3 = CE p (^) 

Also, ?4 ft thus by lemma [A. 11 3 ft. 54 ~> ft A ft = |ft| a 2 

Using the IH for (ft, ^2) and lemma A. 13 we can show that the edge inserted by Update 
satisfies property [T] (similar to the previous case). 

For each triple (<?3,<f4,ft) in TCallers, we call Propagate (^3, • We must show that 
property [T] holds for ($3, £2)- Insertion in TCallers happens only at line 23, which means 
that (<f3, <f4) is in Seen. By IH for (ft , q) and ($3, £4) and by lemma A. 13 we can show that 
there are states ^3 and £2 and path p' such that <?3 = \s3\ai, <?2 = \s2\al and £3 E CEy (52). 
Hence, property [T] holds for (^3,^2)- 
?2 is a tail call 

Similarly. □ 
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